MC4025 Big Data Analytics Syllabus:

MC4025 Big Data Analytics Syllabus โ€“ Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

๏‚ท To understand fundamentals of BigData and Hadoop
๏‚ท To learn about file system configuration in HADOOP
๏‚ท To learn Map Reduce concept of Hadoop in executing Task
๏‚ท To learn the Queue Processing and stream processing of Data
๏‚ท To learn about Hadoop Frameworks

UNIT I INTRODUCTION TO BIG DATA AND HADOOP

Types of Digital Data โ€“ Introduction to Big Data โ€“ Challenges of conventional systems โ€“ Web data โ€“ Evolution of Analytic scalability โ€“ Analytic Processes and Tools โ€“ Analysis vs Reporting -History of Hadoop โ€“ Apache Hadoop โ€“ Analyzing Data with Hadoop โ€“ Hadoop Streaming
Lab Components:
Perform setting up and Installing Hadoop

UNIT II HDFS & HADOOP I/O

Hadoop Distributed File System :The Design of HDFS- HDFS Concepts- The Command-Line Interface- Hadoop File Systems- Data Flow- Parallel Copying with distcp- Hadoop Archives Hadoop I/O: Data Integrity- Compression- Serialization
Lab Components:
๏‚ท Implement HDFS Command Reference
๏‚ท Listing contents of directory, Displaying and printing disk usage, Moving files & directories ,Copying files and directories
๏‚ท Implement the following file management tasks in Hadoop: Writing a file into HDFS
๏‚ท Reading data from HDFS, Retrieving files , Deleting files

UNIT III MAPREDUCE

Analyzing the Data with Hadoop- Hadoop Pipes- MapReduce Types โ€“ Input Formats- Output Formats- MapReduce Features โ€“ MapReduce Works โ€“ Anatomy of a MapReduce Job Run โ€“ Failures โ€“ Job Scheduling โ€“ Shuffle and Sort โ€“ Task Execution
Lab Components:
๏‚ท Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
๏‚ท Implement Matrix vector multiplication map reduce program

UNIT IV QUEUEING AND STREAM PROCESSING SYSTEMS

Queueing: Queueing systems, Introduction to kafka, producer consumer, brokers, types of queues โ€“ single consumer, multi consumer queue servers.
Streaming systems: Stream processing โ€“ queues and workers โ€“ micro batch streaming processing โ€“ introduction to kafka streaming processing API
Lab Components:
Implement Single consumer queue in Kafka
Implement video streaming with producer consumer in Kafka

UNIT V HADOOP FRAMEWORKS

Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators.
Hive : Hive Shell, Hive Services, Hive Metastore, Comparison with Traditional Databases, HiveQL, Tables, Querying Data
Lab Components:
๏‚ท Install and Run Pig then write Pig Latin scripts to sort, group, join your data.
๏‚ท Write a Pig Latin scripts for finding TF-IDF value for book dataset (A corpus of eBooks available at: Project Gutenberg)
๏‚ท Install and Run Hive then use Hive to create, alter, and drop databases, tables

COURSE OUTCOMES:

CO1: Able to apply Hadoop for analyzing Big Volume of Data
CO2: Able to access ,store , do operations on data as Files and directories
CO3; Able to implement MapReduce Concept in analyzing BigData
CO4: Able to implement event streaming using Kafka API
CO5: Able to access volume of data with Hadoop Framework

TOTAL: 75 PERIODS

REFERENCES

1. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with advanced analytics, John Wiley & sons, 2012.
2. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007
3. Tom White, Hadoop: The Definitive Guide, Oโ€™Reilly, 2009
4. Paul Zikopoulos ,Dirk DeRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles , David Corigan , โ€œHarness the Power of Big Data The IBM Big Data Platform โ€œ, Tata McGraw Hill Publications, 2012.
5. Kafka: The Definitive Guide- Real-Time Data and Stream Processing at Scale, by Gwen Shapira, Neha Narkhede ,Todd Palino