DS4015 Big Data Analytics Syllabus:

DS4015 Big Data Analytics Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand the basics of big data analytics
 To understand the search methods and visualization
 To learn mining data streams
 To learn frameworks
 To gain knowledge on R language

UNIT I INTRODUCTION TO BIG DATA

Introduction to Big Data Platform – Challenges of Conventional Systems – Intelligent data analysis –Nature of Data – Analytic Processes and Tools – Analysis Vs Reporting – Modern Data Analytic Tools- Statistical Concepts: Sampling Distributions – Re-Sampling – Statistical Inference – Prediction Error.

UNIT II SEARCH METHODS AND VISUALIZATION

Search by simulated Annealing – Stochastic, Adaptive search by Evaluation – Evaluation Strategies –Genetic Algorithm – Genetic Programming – Visualization – Classification of Visual Data Analysis Techniques – Data Types – Visualization Techniques – Interaction techniques – Specific Visual data analysis Techniques

UNIT III MINING DATA STREAMS

Introduction To Streams Concepts – Stream Data Model and Architecture – Stream Computing – Sampling Data in a Stream – Filtering Streams – Counting Distinct Elements in a Stream – Estimating Moments – Counting Oneness in a Window – Decaying Window – Real time Analytics Platform(RTAP) Applications – Case Studies – Real Time Sentiment Analysis, Stock Market Predictions

UNIT IV FRAMEWORKS

MapReduce – Hadoop, Hive, MapR – Sharding – NoSQL Databases – S3 – Hadoop Distributed File Systems – Case Study- Preventing Private Information Inference Attacks on Social Networks- Grand Challenge: Applying Regulatory Science and Big Data to Improve Medical Device Innovation

UNIT V R LANGUAGE

Overview, Programming structures: Control statements -Operators -Functions -Environment and scope issues -Recursion -Replacement functions, R data structures: Vectors -Matrices and arrays – Lists -Data frames -Classes, Input/output, String manipulations.

COURSE OUTCOMES:

CO1:understand the basics of big data analytics
CO2: Ability to use Hadoop, Map Reduce Framework.
CO3: Ability to identify the areas for applying big data analytics for increasing the business outcome.
CO4:gain knowledge on R language
CO5: Contextually integrate and correlate large amounts of information to gain faster insights.

REFERENCE:

1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
2. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press, 3rd edition 2020.
3. Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, USA, 2011.
4. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
5. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007.