Syntactic and Semantic Analysis -Natural Language Processing

blog authors:

Rakshit Ingale,Rohit Satwadhar,Atisha Wankhede(VIT,Pune)

Syntactic and Semantic Analysis

[source : blumeglobal.com]

Introduction

Natural language processing is the intersection of computer science and artificial intelligence that deals with interaction between computers and humans in natural language. The goal of NLP is to help computers understand the language as we do. Applications of NLP techniques are voice assistants like Amazon's Alexa and Apple's Siri, but also other things such as machine translation and text-filtering

Syntactic and semantic analysis

Syntactic analysis and semantic analysis are the two primary techniques that are used for understanding of natural language .Syntactic(syntax) is the grammatical structure of the text, whereas semantic(semantics) is the meaning being conveyed.

SYNTACTIC ANALYSIS

Syntactic analysis is also referred to as parsing or syntax analysis, it is a process of analyzing natural language with the rules of grammar. Grammatical rules are applied to categories and groups of words, not to an individual word. Syntactic analysis (Syntax) basically assigns a semantic structure to the text or sentence.

SEMANTIC ANALYSIS

The way we (humans) understand Any language is heavily based on its meaning and context .Computers need a different approach. Semantic analysis is a process in which a computer understands the meaning and interpretation of words, signs and sentences . This semantic analysis allows computers to partly understand natural language the way in which humans do. It only allows computers to partly understand because semantic analysis is one of the toughest parts of NLP and it is not fully solved till now.

Concept of parser

Parser

Parsing refers to formal analysis of a sentence by a computer into its constituent which will result in a parse tree showing their syntactic relation to one another in visual form, which can be used for processing further and understanding.

Concept of Parser

It is used for implementing the task of parsing. It is defined as a software component designed to take input data (text) and give structural representation of the input after checking the correct syntax as per formal grammar. It builds a data structure which is generally in the form of a parse tree or abstract syntax tree or other hierarchical structure.

The main roles of the parse include −

To report any syntax error.
To recover from the most commonly occurring error so that the processing of the remainder program can be continued.
To create a parse tree.
To create a symbol table.
To produce intermediate representations (IR)

Types of Parsing

There are two type of parsing

Top-down Parsing
Bottom-up Parsing

Top-down Parsing

In this kind of parsing, the parser starts the construction of the parse tree from the start symbol and then tries to transform the start symbol to the input. Most of the common form of top down parsing uses recursive procedure to process input. The recursive descent parsing main disadvantage is backtracking.

Bottom-up Parsing

In this kind of parsing, the parser starts with the input symbol and tries to construct the parser tree till the start symbol

Concept of derivation

Concept of Derivation

In order to induce input strings we’d like a sequence of production rules. Derivation may be a set of production rules. During parsing we’ve got to decide the non-terminal which is replaced together with deciding the assembly rule with the assistance of which the non-terminal are replaced.

Types of Derivation

Left-most-Derivation
Rightmost-Derivation

Left-most Derivation

left-most derivation during this the sentential kind of input is scanned and replaced from the left to right . The sentential form for this case is termed the Left-sentential form.

Right-most Derivation

In Right-most derivation the sentential type of input is scanned and are replaced from right to left. The sentential form for this case is termed the Right-sentential form.

POS tagging

The 1st level of syntactic analysis is POS (speech of parts) tagging. A word can be tagged as a noun, verb, adjective, adverb or preposition. based on its role in the sentence. Giving right tags such as nouns, verbs or adjectives is one of the most basic functions in syntactic analysis.

For example, you ask Siri a question — “Hey Siri, where can I get a permit to travel between different countries?”. Now, ‘permit’ may possibly have 2 POS tags — a noun and a verb. In the phrase ‘I need a bar permit’, the correct tag of ‘permit’ is ‘noun’. But , in the phrase “Please permit me to play football.”, ‘permit’ is a verb.

Assigning the right POS tag helps us to better understand the intended meaning of a sentence and hence an important part of syntactic processing. In fact, all other parsing techniques (constituency parsing, dependency parsing, etc.) use part-of-speech tags to parse a sentence.

Although POS tagging helps us in identifying the linguistic role of the word in a phrase, it wouldn’t enable us to understand how these words are related to each other in a sentence. That is why the next level of syntactic analysis is needed.

[source: byteiota.com]

Constituency Parsing

To handle the ambiguities and complexities of natural language, first we need to identify and define common grammatical patterns. The first step in getting to know grammar is to split words into groups, called constituents, according to their grammatical role in the phrase.

Let’s get to know constituents in detail with an example. Consider a phrase ‘Rohit — read -> an article on NLP’. The group of words divided by the ‘->’ form a constituent (or a sentence).The justification for placing these words in a unit is given by the notion of substitution, which is, a component can be replaced with other equivalent component, keeping the phrase syntactically valid.

For example, replacing the constituency ‘an article on NLP’ (a noun sentece) with ‘dinner’ (another noun phrase) doesn’t affect the syntax of the phrase, but the resultant phrase “Rohit read dinner” is semantically meaningless.

Very common constituencies in English are Noun Phrases , Verb Phrases , and Prepositional Phrases (PP). There are different other types of phrases, such as an adverbial phrase or a nominal , though in most scenarios we work with only the above 3 phrases along with the nominal.

Finally, the free word order languages such as Marathi are difficult to parse using constituency parsing .That's because, in such free-word-order languages, the order of constituents may change significantly while maintaining the same meaning. Hence, we require dependency parsing for such languages.

Dependency Parsing

In dependency grammar, constituencies (such as Noun phrases) do not form the main elements of grammar, but dependencies are created between words themselves.

Let’s take an example phrase ‘man picked cat’.The dependencies can be made as follows: ‘man’ is the subject of the phrase (the one who is doing something); ‘picked’ is the main verb (something that is being done); while ‘cat’ is the object of ‘picked’ (to whom something is being done).

So, the idea of dependency parsing is based on the fact that every phrase is about something, and generally involves a subject, a verb and an object.

In general, Subject-Verb-Object (SVO) is the basic word order in English (called ‘rigid word order’). Obviously, many sentences are much more complex to fall into this basic SVO structure, although sophisticated dependency parsing are able to handle many of them.

Dependency parsing is an advanced topic whose study involves a deeper understanding of English grammar and parsing algorithms. Therefore we will not go in very detail here.

Semantic Analysis (Word Sense Disambiguation, Relationship Extraction)

Syntax and semantics analysis are the two primary stones of natural language processing. While syntactic analysis deals with the syntax of the sentence, semantic analysis helps systems draw meaning from that sentence. Semantic is a linguistic term related to meaning or logic.

Understanding relationships between lexical items plays an important role in semantic analysis . Lexical semantics helps machines in doing so.

Hyponyms: specific lexical items in generic lexical items.
E.g. car is a hyponym of vehicle
Meronomy: word that denotes a constituent part or a member of something
E.g. apple is meronym of apple tree
Polysemy: defined as “the coexistence of many possible meanings for a word or phrase.”
E.g. Sound : It has 19 noun meanings, 12 adjective meanings, 4 verb phases and 2 adverb meanings.
Synonyms: Words which has same or nearly the same meaning
E.g. chilly is synonym of cold
Antonyms : Opposite of synonyms . words which have opposing meanings.
E.g. happy,sad
Homonyms : Words which sound the same but have different meanings.
E.g. orange (Fruit and color)

By training sufficiently advanced ML algorithms, this process can be automated. Machines can draw predictions based on the past observations.

Various sub tasks are involved in semantic based approach for ML including word sense disambiguation and relationship extraction

Word Sense Disambiguation:
Process of identifying sense of word according to its context in sentence.WSD in natural classification problems. People tend to have this natural ability to draw contextual meaning of words.
E.g. Pass me a pen. (Pen is writing device)
Pen a letter for me. (Pen means write)

This goes to say that natural language is very ambiguous and words tend to change their meaning very often. Same word can mean two different things depending on how it's used.

Relationship Extraction:
Relation Extraction is the task of predicting attributes and relations for entities in a sentence. This is a key component to build relation knowledge graphs.
These relations are like “WorksAt”,”is the PM of” etc.

Semantic Classification Models:
These models are used when we want to assign predefined categories to text.
Some examples are topic classification, Sentiment Analysis, Intent Classification.

Topic Classification: Classifying text based on the topic of that text. E.g. Customer service team can classify queries as “Login Issue”, “Payment Issue” etc.
Intent Classification: Draw intent of text. E.g. Based on email, classify it into categories like “Interested” or “Not interested”.

Semantic Extraction Models:
These models are used when we want to extract specific information from text.
Keyword Extraction: Finding relevant words and expressions in text.
Entity Extraction: Identifying named entities in text. E.g. names of people , places etc.

Search This Blog

Syntactic and Semantic Analysis-NLP

Syntactic and Semantic Analysis -Natural Language Processing

Comments

Post a Comment