ML4291 Natural Language Processing Syllabus:

ML4291 Natural Language Processing Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand basics of linguistics, probability and statistics
 To study statistical approaches to NLP and understand sequence labeling
 To outline different parsing techniques associated with NLP
 To explore semantics of words and semantic role labeling of sentences
 To understand discourse analysis, question answering and chatbots

UNIT I INTRODUCTION

Natural Language Processing – Components – Basics of Linguistics and Probability and Statistics – Words-Tokenization-Morphology-Finite State Automata

UNIT II STATISTICAL NLP AND SEQUENCE LABELING

N-grams and Language models –Smoothing -Text classification- Naïve Bayes classifier – Evaluation – Vector Semantics – TF-IDF – Word2Vec- Evaluating Vector Models -Sequence Labeling – Part of Speech – Part of Speech Tagging -Named Entities –Named Entity Tagging

UNIT III CONTEXTUAL EMBEDDING

Constituency –Context Free Grammar –Lexicalized Grammars- CKY Parsing – Earley’s algorithm-Evaluating Parsers -Partial Parsing – Dependency Relations- Dependency Parsing – Transition Based – Graph Based

UNIT IV COMPUTATIONAL SEMANTICS

Word Senses and WordNet – Word Sense Disambiguation – Semantic Role Labeling – Proposition Bank- FrameNet- Selectional Restrictions – Information Extraction – Template Filling

UNIT V DISCOURSE ANALYSIS AND SPEECH PROCESSING

Discourse Coherence – Discourse Structure Parsing – Centering and Entity Based Coherence – Question Answering –Factoid Question Answering – Classical QA Models – Chatbots and Dialogue systems – Frame-based Dialogue Systems – Dialogue–State Architecture

TOTAL : 30 PERIODS

SUGGESTED ACTIVITIES:

1. Probability and Statistics for NLP Problems
2. Carry out Morphological Tagging and Part-of-Speech Tagging for a sample text
3. Design a Finite State Automata for more Grammatical Categories
4. Problems associated with Vector Space Model
5. Hand Simulate the working of a HMM model
6. Examples for different types of work sense disambiguation
7. Give the design of a Chatbot

PRACTICAL EXERCISES: PERIODS : 30

1. Download nltk and packages. Use it to print the tokens in a document and the sentences from it.
2. Include custom stop words and remove them and all stop words from a given document using nltk or spaCY package
3. Implement a stemmer and a lemmatizer program.
4. Implement a simple Part-of-Speech Tagger
5. Write a program to calculate TFIDF of documents and find the cosine similarity between any two documents.
6. Use nltk to implement a dependency parser.
7. Implement a semantic language processor that uses WordNet for semantic tagging.
8. Project – (in Pairs) Your project must use NLP concepts and apply them to some data.
a. Your project may be a comparison of several existing systems, or it may propose a new system in which case you still must compare it to at least one other approach.
b. You are free to use any third-party ideas or code that you wish as long as it is publicly available.
c. You must properly provide references to any work that is not your own in the writeup.
d. Project proposal You must turn in a brief project proposal. Your project proposal should describe the idea behind your project. You should also briefly describe software you will need to write, and papers (2-3) you plan to read.
List of Possible Projects
1. Sentiment Analysis of Product Reviews
2. Information extraction from News articles
3. Customer support bot
4. Language identifier
5. Media Monitor
6. Paraphrase Detector
7. Identification of Toxic Comment
8. Spam Mail Identification

COURSE OUTCOMES:

CO1: Understand basics of linguistics, probability and statistics associated with NLP
CO2: Implement a Part-of-Speech Tagger
CO3: Design and implement a sequence labeling problem for a given domain
CO4: Implement semantic processing tasks and simple document indexing and searching system using the concepts of NLP
CO5:: Implement a simple chatbot using dialogue system concepts

TOTAL : 60 PERIODS

REFERENCES

1. Daniel Jurafsky and James H.Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition” (Prentice Hall Series in Artificial Intelligence), 2020
2. Jacob Eisenstein. “Natural Language Processing “, MIT Press, 2019
3. Samuel Burns “Natural Language Processing: A Quick Introduction to NLP with Python and NLTK, 2019
4. Christopher Manning, “Foundations of Statistical Natural Language Processing”, MIT Press, 2009.
5. Nitin Indurkhya,Fred J. Damerau, “Handbook of Natural Language Processing”, Second edition, Chapman & Hall/CRC: Machine Learning & Pattern Recognition, Hardcover,2010
6. Deepti Chopra, Nisheeth Joshi, “Mastering Natural Language Processing with Python”, Packt Publishing Limited, 2016
7. Mohamed Zakaria Kurdi “Natural Language Processing and Computational Linguistics: Speech, Morphology and Syntax (Cognitive Science)”, ISTE Ltd., 2016
8. Atefeh Farzindar, Diana Inkpen, “Natural Language Processing for Social Media (Synthesis Lectures on Human Language Technologies)”, Morgan and Claypool Life Sciences, 2015