BD4007 R Language for Mining Syllabus:

BD4007 R Language for Mining Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES

 To study the major data mining problems as different types of computational tasks (prediction, classification, clustering, etc.) and the algorithms appropriate for addressing these tasks
 To learn how to analyze data through statistical and graphical summarization, supervised and unsupervised learning algorithms
 To systematically evaluate data mining algorithms and understand how to choose algorithms for different analysis tasks

UNIT I INTRODUCTION DATA MINING

Introduction, Mining Association Rules in Large Databases, Mining Frequent Patterns – basic concepts – Efficient and scalable frequent item set mining methods, Apriori algorithm, FP-Growth algorithm, Associations – mining various kinds of association rules.

UNIT II PREDICTIVE MODELING AND CLUSTERING

Classification and Prediction-Issues Classification by Decision Tree Induction–Bayesian Classification – Other Classification Methods – Prediction–Clusters Analysis – Basics of cluster analysis -Types of Data in Cluster Analysis – Categorization of Major Clustering Methods – Partitioning Methods – Hierarchical Methods.

UNIT III MINING DATA STREAMS

Introduction To Streams Concepts – Stream Data Model and Architecture – Stream Computing – Sampling Data in a Stream – Filtering Streams – Counting Distinct Elements in a Stream – Estimating Moments – Counting Oneness in a Window – Decaying Window – Real time Analytics Platform(RTAP) Applications.

UNIT IV DATA ANALYTIC METHODS USING R

Introduction to R – Analyzing and exploring data with R – Statistics for model building and evaluation. Analytical Approaches, introducing to various Analytical Tools, Installing R, Handling Basic Expressions in R, Variables in R, Working with Vectors, Storing and Calculating Values in R, Creating and Using Objects, Interacting with Users, Handling Data in R Workspace

UNIT V FUNCTIONS AND PACKAGES IN R

Executing Scripts, Reading Datasets and Exporting Data from Manipulating and Processing Data in R, Working with Functions and Packages in R, Performing Graphical Analysis in R, Techniques Used for Visual Data Representation, Types of Data Visualization.

TOTAL: 45 PERIODS

COURSE OUTCOMES:

Upon completion of the course, the student should be able to:
CO1: Demonstrate accurate and efficient use of classification using the R system for the computations.
CO2: Demonstrate the related data mining techniques Using R
CO3: Demonstrate capacity for mathematical reasoning through analyzing, proving and explaining concepts from the theory that underpins classification and related data mining methods
CO4: Apply problem-solving using classification and related data mining techniques to diverse situations in business, biology, engineering and other sciences
CO5: Analyze the data visualization

REFERENCES:

1. Carlo Vercellis, Business Intelligence: Data mining and Optimization for Decision Making, WILEY.
2. Han J., Kamber M. and Pei J, Data mining concepts and techniques, Morgan Kaufmann Publishers (2011) 3rd ed.
3. Big Data Computing and Communications edited by Yu Wang, Hui Xiong, Shlomo Argamon, XiangYang Li, JianZhong Li Springer
4. Andrea Cirillo,”R Data Mining: Implement data mining techniques through practical use cases and real world datasets”,Packt Publication,1st Edition,2017.
5. Luis Torgo.”Data Mining with R” Learning with Case Studies, Second Edition 2020,Chapman and Hall/CRC.

WEB REFERENCES:

1. https://onlinecourses-archive.nptel.ac.in/noc18-mg11/announcements
2. https://swayam.gov.in/nd1_noc19_ma33/preview
3. www.datacamp.com/R-Tutorial

ONLINE RESOURCES:

1. https://www.youtube.com/watch?v=BB2O4VCu5j8
2. https://www.tutorialspoint.com/r/index.htm
3. http://www.rdatamining.com/