MC4005 Information Retrieval Techniques Syllabus:

MC4005 Information Retrieval Techniques Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand the basics of information retrieval with pertinence to modeling, query operations and indexing
 To get an understanding of machine learning techniques for text classification and clustering.
 To understand the various applications of information retrieval giving emphasis to multimedia IR, web search
 To understand the concepts of digital libraries

UNIT I MOTIVATION

Basic Concepts – Practical Issues – Retrieval Process – Architecture – Boolean Retrieval – Retrieval Evaluation – Open Source IR Systems–History of Web Search – Web Characteristics– The impact of the web on IR ––IR Versus Web Search–Components of a Search engine

UNIT II MODELING

Taxonomy and Characterization of IR Models – Boolean Model – Vector Model – Term Weighting – Scoring and Ranking –Language Models – Set Theoretic Models – Probabilistic Models – Algebraic Models – Structured Text Retrieval Models – Models for Browsing

UNIT III INDEXING

Static and Dynamic Inverted Indices – Index Construction and Index Compression. Searching-Sequential Searching and Pattern Matching. Query Operations -Query Languages – Query Processing – Relevance Feedback and Query Expansion – Automatic Local and Global Analysis – Measuring Effectiveness and Efficiency

UNIT IV CLASSIFICATION AND CLUSTERING

Text Classification and Naïve Bayes – Vector Space Classification – Support vector machines and Machine learning on documents. Flat Clustering – Hierarchical Clustering – Matrix decompositions and latent semantic indexing – Fusion and Meta learning

UNIT V SEARCHING THE WEB AND RETRIEVAL

Searching the Web –Structure of the Web –IR and web search – Static and Dynamic Ranking – Web Crawling and Indexing – Link Analysis – XML Retrieval Multimedia IR: Models and Languages – Indexing and Searching Parallel and Distributed IR – Digital Libraries

TOTAL: 45 PERIODS

SUGGESTED ACTIVITIES:

1. Compare the features of any three search engines
2. Compare and contrast the IR models
3. List out features of the various IR Query languages
4. List out the applications of classification and clustering in Machine Learning
5. A Study on web crawler used by any Search Engine for indexing the sites
(For eg., Google, Mozilla, Internet Explorer,….)

COURSE OUTCOMES:

Upon completion of this course, the students should be able to:
CO1: Build an Information Retrieval system using the available tools.
CO2: Identify and design the various components of an Information Retrieval system.
CO3: Model an information retrieval system
CO4: Apply machine learning techniques to text classification and clustering which is used for efficient Information Retrieval.
CO5: Design an efficient search engine and analyze the Web content structure.

REFERENCES

1. Implementing and Evaluating Search Engines, The MIT Press, Cambridge, Massachusetts London, England, First Edition2010.
2. Manning D. Christopher, Raghavan Prabhakar & Schutz Hinrich, “ Introduction to Information Retrieval”, Cambridge University Press, Online Edition,2009.
3. David A. Grossman, Ophir Frieder, “Information Retrieval: Algorithms and Heuristics”, Springer, 2nd Edition, 2004.
4. Bruce Croft, Donald Metzler, Trevor Strohman, “Search Engines: Information Retrieval in Practice”, Pearson, 2009.
5. Ricardo Baeza – Yates, Berthier Ribeiro – Neto, ―Modern Information Retrieval: The concepts and Technology behind Search‖ (ACM Press Books), Second Edition, 2011.
6. Stefan Buttcher, Charles L. A. Clarke, Gordon V. Cormack, ―Information Retrieval: Implementing and Evaluating Search Engines (The MIT Press), Illustrated Edition, 2016.