DS4015 Big Data Analytics Syllabus:
DS4015 Big Data Analytics Syllabus – Anna University PG Syllabus Regulation 2021
COURSE OBJECTIVES:
To understand the basics of big data analytics
To understand the search methods and visualization
To learn mining data streams
To learn frameworks
To gain knowledge on R language
UNIT I INTRODUCTION TO BIG DATA
Introduction to Big Data Platform – Challenges of Conventional Systems – Intelligent data analysis –Nature of Data – Analytic Processes and Tools – Analysis Vs Reporting – Modern Data Analytic Tools- Statistical Concepts: Sampling Distributions – Re-Sampling – Statistical Inference – Prediction Error.
UNIT II SEARCH METHODS AND VISUALIZATION
Search by simulated Annealing – Stochastic, Adaptive search by Evaluation – Evaluation Strategies –Genetic Algorithm – Genetic Programming – Visualization – Classification of Visual Data Analysis Techniques – Data Types – Visualization Techniques – Interaction techniques – Specific Visual data analysis Techniques
UNIT III MINING DATA STREAMS
Introduction To Streams Concepts – Stream Data Model and Architecture – Stream Computing – Sampling Data in a Stream – Filtering Streams – Counting Distinct Elements in a Stream – Estimating Moments – Counting Oneness in a Window – Decaying Window – Real time Analytics Platform(RTAP) Applications – Case Studies – Real Time Sentiment Analysis, Stock Market Predictions
UNIT IV FRAMEWORKS
MapReduce – Hadoop, Hive, MapR – Sharding – NoSQL Databases – S3 – Hadoop Distributed File Systems – Case Study- Preventing Private Information Inference Attacks on Social Networks- Grand Challenge: Applying Regulatory Science and Big Data to Improve Medical Device Innovation
UNIT V R LANGUAGE
Overview, Programming structures: Control statements -Operators -Functions -Environment and scope issues -Recursion -Replacement functions, R data structures: Vectors -Matrices and arrays – Lists -Data frames -Classes, Input/output, String manipulations.
COURSE OUTCOMES:
CO1:understand the basics of big data analytics
CO2: Ability to use Hadoop, Map Reduce Framework.
CO3: Ability to identify the areas for applying big data analytics for increasing the business outcome.
CO4:gain knowledge on R language
CO5: Contextually integrate and correlate large amounts of information to gain faster insights.
REFERENCE:
1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
2. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press, 3rd edition 2020.
3. Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, USA, 2011.
4. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
5. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007.