Syntactic and Semantic Analysis -Natural Language Processing

 

blog authors:

Rakshit Ingale,Rohit Satwadhar,Atisha Wankhede(VIT,Pune)


Syntactic and Semantic Analysis

                                        [source : blumeglobal.com]

Introduction

Natural language processing is the intersection of computer science and artificial intelligence that deals with  interaction between computers and humans in natural language. The  goal of NLP is to help computers understand the language as  we do. Applications of NLP techniques are voice assistants like Amazon's Alexa and Apple's Siri, but also other things such as  machine translation and text-filtering

Syntactic and semantic analysis

Syntactic analysis  and semantic analysis  are the two primary techniques that are used  for  understanding of natural language .Syntactic(syntax) is the grammatical structure of the text, whereas semantic(semantics) is the meaning being conveyed.

SYNTACTIC ANALYSIS

Syntactic analysis is also  referred to as parsing or syntax analysis, it  is a process of analyzing natural language with the rules of  grammar. Grammatical rules are  applied to categories and groups of words, not to an individual word. Syntactic analysis (Syntax) basically assigns a semantic structure to the  text or sentence.

SEMANTIC ANALYSIS

The way we (humans) understand Any  language is heavily based on its  meaning and context .Computers need a different approach. Semantic analysis is a process in which a computer  understands the meaning and interpretation of words, signs and sentences . This semantic analysis allows  computers to  partly understand  natural language the way in which  humans do. It only allows computers to partly understand  because semantic analysis is one of the toughest parts of NLP and it is not fully solved till now.


Concept of parser 

Parser

Parsing refers to formal analysis of a sentence by a computer into its constituent which will result in a parse tree showing their syntactic relation to one another in visual form, which can be used for  processing further and understanding.

Concept of Parser

It is used for  implementing the task of parsing. It is defined as a software component designed to  take input data (text) and give structural representation of the input after checking the  correct syntax as per formal grammar. It  builds a data structure which is generally in the form of a parse tree or abstract syntax tree or  other hierarchical structure.

The main roles of the parse include −

  1. To report any syntax error.

  2. To recover from the most  commonly occurring error so that the processing of the remainder program can be continued.

  3. To create a parse tree.

  4. To create a symbol table.

  5. To produce intermediate representations (IR)



Types of Parsing

There are two type of parsing

  1. Top-down Parsing

  2. Bottom-up Parsing

Top-down Parsing

In this kind of parsing, the parser starts the  construction of  the parse tree from the start symbol and then tries to transform the start symbol to the input. Most of the  common form  of top down parsing uses recursive procedure to process  input. The  recursive descent parsing main disadvantage is backtracking.


Bottom-up Parsing

In this kind of parsing, the parser starts with the input symbol and tries to construct the parser tree till the start symbol

Concept of derivation

Concept of Derivation

In order to induce  input strings we’d like a sequence of production rules. Derivation may be a  set of production rules. During parsing we’ve got to decide the non-terminal which is replaced together with deciding the assembly rule with the assistance  of which the non-terminal are replaced.


Types of Derivation

  1. Left-most-Derivation

  2. Rightmost-Derivation

Left-most Derivation

left-most derivation during this the sentential kind of input is scanned and replaced from the left to right . The sentential form for this case is termed  the Left-sentential form.

Right-most Derivation

In Right-most derivation the sentential type of input is scanned and are replaced from right to left. The sentential form for this case is termed the Right-sentential form.

POS tagging

The 1st level of syntactic analysis is POS (speech of parts) tagging. A word can be tagged as a noun, verb, adjective, adverb or preposition. based on its role in the sentence. Giving right tags such as nouns, verbs or adjectives is one of the most basic functions in syntactic analysis.

For example, you ask Siri a question — “Hey Siri, where can I get a permit to travel between different countries?”. Now,  ‘permit’ may possibly have 2 POS tags — a noun and a verb. In the phrase ‘I need a bar permit’, the correct tag of ‘permit’ is ‘noun’. But , in the phrase “Please permit me to play football.”, ‘permit’ is a verb.

Assigning the right POS tag helps us to better understand the intended meaning of a sentence and hence an important part of syntactic processing. In fact, all other parsing techniques (constituency parsing, dependency parsing, etc.) use part-of-speech tags to parse a sentence.

Although POS tagging helps us in identifying the linguistic role of the word in a phrase, it wouldn’t enable us to understand how these words are related to each other in a sentence. That is why the next level of syntactic analysis is needed.

                                                       [source: byteiota.com]

Constituency Parsing

 To handle the ambiguities and complexities of natural language, first we need to identify and define common grammatical patterns. The first step in getting to know grammar is to split words into groups, called constituents, according to their grammatical role in the phrase.

Let’s get to know constituents in detail with an example. Consider a phrase ‘Rohit — read -> an article on NLP’. The group of words divided by the ‘->’ form a constituent (or a sentence).The justification for placing these words in a unit is given by the notion of substitution, which is, a component can be replaced with other equivalent component, keeping the phrase syntactically valid.

For example, replacing the constituency ‘an article on NLP’ (a noun sentece) with ‘dinner’ (another noun phrase) doesn’t affect the syntax of the phrase, but the resultant phrase “Rohit read dinner” is semantically meaningless.

Very  common constituencies in English are Noun Phrases , Verb Phrases , and Prepositional Phrases (PP). There are different other types of phrases, such as an adverbial phrase or a nominal , though in most scenarios we work with only the above 3 phrases along with the nominal.

Finally, the free word order languages such as Marathi are difficult to parse using constituency parsing .That's because, in such free-word-order languages, the order of constituents may change significantly while maintaining the same meaning. Hence, we require dependency parsing for such languages.

Dependency Parsing

In dependency grammar, constituencies (such as Noun phrases) do not form the main elements of grammar, but dependencies are created between words themselves.

Let’s take an example phrase ‘man picked cat’.The dependencies can be made as follows: ‘man’ is the subject of the phrase (the one who is doing something); ‘picked’ is the main verb (something that is being done); while ‘cat’ is the object of ‘picked’ (to whom something is being done).

So, the  idea of ​​dependency parsing is based on the fact that every phrase is about something, and generally involves a subject, a verb and an object.

 In general, Subject-Verb-Object (SVO) is the basic word order in English (called ‘rigid word order’). Obviously, many sentences are much more complex to fall into this basic SVO structure, although sophisticated dependency parsing are able to handle many of them.

Dependency parsing is an advanced topic whose study involves a deeper understanding of English grammar and parsing algorithms. Therefore we will not go in very detail here.




Semantic Analysis (Word Sense Disambiguation, Relationship Extraction)

Syntax and semantics analysis are the two primary stones of natural language processing. While syntactic analysis deals with the syntax of the sentence, semantic analysis helps systems draw meaning from that sentence. Semantic is a linguistic term related to meaning or logic.

Understanding relationships between lexical items plays an important role in semantic analysis . Lexical semantics helps machines in doing so.

  • Hyponyms: specific lexical items in generic lexical items.
    E.g. car is a hyponym of vehicle

  • Meronomy:  word that denotes a constituent part or a member of something
    E.g. apple is meronym of apple tree

  • Polysemy: defined as “the coexistence of many possible meanings for a word or phrase.”
    E.g. Sound : It has 19 noun meanings, 12 adjective meanings, 4 verb phases and 2 adverb meanings.

  • Synonyms: Words which has same or nearly the same meaning
    E.g.  chilly is synonym of cold

  • Antonyms : Opposite of synonyms . words which have opposing meanings.
    E.g. happy,sad

  • Homonyms : Words which sound the same but have different meanings.
    E.g. orange (Fruit and color)

By training sufficiently advanced ML algorithms, this process can be automated. Machines can draw predictions based on the past observations.

Various sub tasks are involved in semantic based approach for ML including word sense disambiguation and relationship extraction

Word Sense Disambiguation:
Process of identifying sense of word according to its context in sentence.WSD in natural classification problems. People tend to have this natural ability to draw contextual meaning of words.
E.g. Pass me a pen. (Pen is writing device)
Pen a letter for me. (Pen means write)

This goes to say that natural language is very ambiguous and words tend to change their meaning very often. Same word can mean two different things depending on how it's used.

Relationship Extraction:
Relation Extraction is the task of predicting attributes and relations for entities in a sentence. This is a key component to build relation knowledge graphs.
These relations are like “WorksAt”,”is the PM of” etc.

Semantic Classification Models:
These models are used when we want to assign predefined categories to text.
Some examples are topic classification, Sentiment Analysis, Intent Classification.

Topic Classification: Classifying text based on the topic of that text. E.g. Customer service team can classify queries as “Login Issue”, “Payment Issue” etc.
Intent Classification:  Draw intent of text. E.g. Based on email, classify it into categories like “Interested” or “Not interested”.

Semantic Extraction Models:
These models are used when we want to extract specific information from text.
Keyword Extraction:  Finding relevant words and expressions in text.
Entity Extraction: Identifying named entities in text. E.g. names of people , places etc.


Comments

Post a Comment