ML4291 Natural Language Processing Syllabus:
ML4291 Natural Language Processing Syllabus – Anna University PG Syllabus Regulation 2021
COURSE OBJECTIVES:
To understand basics of linguistics, probability and statistics
To study statistical approaches to NLP and understand sequence labeling
To outline different parsing techniques associated with NLP
To explore semantics of words and semantic role labeling of sentences
To understand discourse analysis, question answering and chatbots
UNIT I INTRODUCTION
Natural Language Processing – Components – Basics of Linguistics and Probability and Statistics – Words-Tokenization-Morphology-Finite State Automata
UNIT II STATISTICAL NLP AND SEQUENCE LABELING
N-grams and Language models –Smoothing -Text classification- Naïve Bayes classifier – Evaluation – Vector Semantics – TF-IDF – Word2Vec- Evaluating Vector Models -Sequence Labeling – Part of Speech – Part of Speech Tagging -Named Entities –Named Entity Tagging
UNIT III CONTEXTUAL EMBEDDING
Constituency –Context Free Grammar –Lexicalized Grammars- CKY Parsing – Earley’s algorithm-Evaluating Parsers -Partial Parsing – Dependency Relations- Dependency Parsing – Transition Based – Graph Based
UNIT IV COMPUTATIONAL SEMANTICS
Word Senses and WordNet – Word Sense Disambiguation – Semantic Role Labeling – Proposition Bank- FrameNet- Selectional Restrictions – Information Extraction – Template Filling
UNIT V DISCOURSE ANALYSIS AND SPEECH PROCESSING
Discourse Coherence – Discourse Structure Parsing – Centering and Entity Based Coherence – Question Answering –Factoid Question Answering – Classical QA Models – Chatbots and Dialogue systems – Frame-based Dialogue Systems – Dialogue–State Architecture
TOTAL : 30 PERIODS
SUGGESTED ACTIVITIES:
1. Probability and Statistics for NLP Problems
2. Carry out Morphological Tagging and Part-of-Speech Tagging for a sample text
3. Design a Finite State Automata for more Grammatical Categories
4. Problems associated with Vector Space Model
5. Hand Simulate the working of a HMM model
6. Examples for different types of work sense disambiguation
7. Give the design of a Chatbot
PRACTICAL EXERCISES: PERIODS : 30
1. Download nltk and packages. Use it to print the tokens in a document and the sentences from it.
2. Include custom stop words and remove them and all stop words from a given document using nltk or spaCY package
3. Implement a stemmer and a lemmatizer program.
4. Implement a simple Part-of-Speech Tagger
5. Write a program to calculate TFIDF of documents and find the cosine similarity between any two documents.
6. Use nltk to implement a dependency parser.
7. Implement a semantic language processor that uses WordNet for semantic tagging.
8. Project – (in Pairs) Your project must use NLP concepts and apply them to some data.
a. Your project may be a comparison of several existing systems, or it may propose a new system in which case you still must compare it to at least one other approach.
b. You are free to use any third-party ideas or code that you wish as long as it is publicly available.
c. You must properly provide references to any work that is not your own in the writeup.
d. Project proposal You must turn in a brief project proposal. Your project proposal should describe the idea behind your project. You should also briefly describe software you will need to write, and papers (2-3) you plan to read.
List of Possible Projects
1. Sentiment Analysis of Product Reviews
2. Information extraction from News articles
3. Customer support bot
4. Language identifier
5. Media Monitor
6. Paraphrase Detector
7. Identification of Toxic Comment
8. Spam Mail Identification
COURSE OUTCOMES:
CO1: Understand basics of linguistics, probability and statistics associated with NLP
CO2: Implement a Part-of-Speech Tagger
CO3: Design and implement a sequence labeling problem for a given domain
CO4: Implement semantic processing tasks and simple document indexing and searching system using the concepts of NLP
CO5:: Implement a simple chatbot using dialogue system concepts
TOTAL : 60 PERIODS
REFERENCES
1. Daniel Jurafsky and James H.Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition” (Prentice Hall Series in Artificial Intelligence), 2020
2. Jacob Eisenstein. “Natural Language Processing “, MIT Press, 2019
3. Samuel Burns “Natural Language Processing: A Quick Introduction to NLP with Python and NLTK, 2019
4. Christopher Manning, “Foundations of Statistical Natural Language Processing”, MIT Press, 2009.
5. Nitin Indurkhya,Fred J. Damerau, “Handbook of Natural Language Processing”, Second edition, Chapman & Hall/CRC: Machine Learning & Pattern Recognition, Hardcover,2010
6. Deepti Chopra, Nisheeth Joshi, “Mastering Natural Language Processing with Python”, Packt Publishing Limited, 2016
7. Mohamed Zakaria Kurdi “Natural Language Processing and Computational Linguistics: Speech, Morphology and Syntax (Cognitive Science)”, ISTE Ltd., 2016
8. Atefeh Farzindar, Diana Inkpen, “Natural Language Processing for Social Media (Synthesis Lectures on Human Language Technologies)”, Morgan and Claypool Life Sciences, 2015