MC4020 Data Mining and Data Warehousing Techniques Syllabus:

MC4020 Data Mining and Data Warehousing Techniques Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To characterize the kinds of patterns that can be discovered by association rule mining.
 To implement classification techniques on large datasets.
 To analyse various clustering techniques in real world applications.
 To get exposed to the concepts of data warehousing architecture and implementation

UNIT I DATA MINING & DATA PREPROCESSING

Data Mining– Concepts , DBMS vs Data mining , kinds of Data, Applications, Issues and Challenges–Need for Data Preprocessing – Data Cleaning – Data Integration and Transformation – Data Reduction – Data Discretization and Concept Hierarchy Generation.

UNIT II ASSOCIATION RULE MINING AND CLASSIFICATION

Introduction to Association rules – Association Rule Mining – Mining Frequent Item sets with and without Candidate Generation –Classification versus Prediction – Data Preparation for Classification and Prediction

UNIT III CLASSIFICATION AND PREDICTION TECHNIQUES

Classification by Decision Tree – Bayesian Classification – Rule Based Classification – Bayesian Belief Networks – Classification by Backpropagation – Support Vector Machines – K-Nearest Neighbor Algorithm – Linear Regression, Nonlinear Regression

UNIT IV CLUSTERING TECHNIQUES

Cluster Analysis – Partitioning Methods: k-Means and k- Medoids – Hierarchical Methods: Agglomerative and Divisive –Model Based Clustering Methods: Fuzzy clusters and Expectation Maximization Algorithm

UNIT V DATA WAREHOUSE

Need for Data Warehouse – Database versus Data Warehouse – Multidimensional Data Model – Schemas for Multidimensional Databases – OLAP operations – OLAP versus OLTP – Data Warehouse Architecture – Extraction, Transformation and Loading (ETL)

SUGGESTED ACTIVITIES:

1. Perform attribute ranking for a dataset (Eg: contact-lenses dataset
https://archive.ics.uci.edu/ml/datasets/lenses) using any two attribute ranking methods.
2. Identify the association rules in the above dataset using Apriori algorithm.
3. Implement K-Nearest Neighbor for classification of a dataset (Eg: Iris dataset
https://archive.ics.uci.edu/ml/datasets/Iris).
4. Demonstrate the K-means clustering process in the above dataset.
5. Describe the steps in building Data warehouse using open source tools (Eg: Pentaho Data Integration Tool)

TOTAL: 45 PERIODS

COURSE OUTCOMES:

On completion of the course, the students will be able to:
CO1:Identify data mining techniques in building intelligent model.
CO2:Illustrate association mining techniques on transactional databases.
CO3:Apply classification and clustering techniques in real world applications.
CO4:Evaluate various mining techniques on complex data objects.
CO5:Design, create and maintain data warehouses

REFERENCES

1. Jiawei Han, Micheline Kamber, “Data Mining Concepts and Techniques”, Third Edition, Elsevier, 2012.
2. K. P. Soman, Shyam Diwakar, V. Ajay, “Insight into Data mining Theory and Practice”, Easter Economy Edition, Prentice Hall of India, 2009.
3. Data Warehousing, Data Mining, & OLAP – Alex Berson, Stephen Smith, TMHill,2008.
4. David L. Olson Dursun Delen, “Advanced Data Mining Techniques,” Springer-Verlag Berlin Heidelberg, 2008
5. G. K. Gupta, “Introduction to Data Min Data Mining with Case Studies”, Eastern Economy Edition, Prentice Hall of India, Third Edition, 2014