MC4025 Big Data Analytics Syllabus:

MC4025 Big Data Analytics Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand fundamentals of BigData and Hadoop
 To learn about file system configuration in HADOOP
 To learn Map Reduce concept of Hadoop in executing Task
 To learn the Queue Processing and stream processing of Data
 To learn about Hadoop Frameworks

UNIT I INTRODUCTION TO BIG DATA AND HADOOP

Types of Digital Data – Introduction to Big Data – Challenges of conventional systems – Web data – Evolution of Analytic scalability – Analytic Processes and Tools – Analysis vs Reporting -History of Hadoop – Apache Hadoop – Analyzing Data with Hadoop – Hadoop Streaming
Lab Components:
Perform setting up and Installing Hadoop

UNIT II HDFS & HADOOP I/O

Hadoop Distributed File System :The Design of HDFS- HDFS Concepts- The Command-Line Interface- Hadoop File Systems- Data Flow- Parallel Copying with distcp- Hadoop Archives Hadoop I/O: Data Integrity- Compression- Serialization
Lab Components:
 Implement HDFS Command Reference
 Listing contents of directory, Displaying and printing disk usage, Moving files & directories ,Copying files and directories
 Implement the following file management tasks in Hadoop: Writing a file into HDFS
 Reading data from HDFS, Retrieving files , Deleting files

UNIT III MAPREDUCE

Analyzing the Data with Hadoop- Hadoop Pipes- MapReduce Types – Input Formats- Output Formats- MapReduce Features – MapReduce Works – Anatomy of a MapReduce Job Run – Failures – Job Scheduling – Shuffle and Sort – Task Execution
Lab Components:
 Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
 Implement Matrix vector multiplication map reduce program

UNIT IV QUEUEING AND STREAM PROCESSING SYSTEMS

Queueing: Queueing systems, Introduction to kafka, producer consumer, brokers, types of queues – single consumer, multi consumer queue servers.
Streaming systems: Stream processing – queues and workers – micro batch streaming processing – introduction to kafka streaming processing API
Lab Components:
Implement Single consumer queue in Kafka
Implement video streaming with producer consumer in Kafka

UNIT V HADOOP FRAMEWORKS

Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators.
Hive : Hive Shell, Hive Services, Hive Metastore, Comparison with Traditional Databases, HiveQL, Tables, Querying Data
Lab Components:
 Install and Run Pig then write Pig Latin scripts to sort, group, join your data.
 Write a Pig Latin scripts for finding TF-IDF value for book dataset (A corpus of eBooks available at: Project Gutenberg)
 Install and Run Hive then use Hive to create, alter, and drop databases, tables

COURSE OUTCOMES:

CO1: Able to apply Hadoop for analyzing Big Volume of Data
CO2: Able to access ,store , do operations on data as Files and directories
CO3; Able to implement MapReduce Concept in analyzing BigData
CO4: Able to implement event streaming using Kafka API
CO5: Able to access volume of data with Hadoop Framework

TOTAL: 75 PERIODS

REFERENCES

1. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with advanced analytics, John Wiley & sons, 2012.
2. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007
3. Tom White, Hadoop: The Definitive Guide, O’Reilly, 2009
4. Paul Zikopoulos ,Dirk DeRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles , David Corigan , “Harness the Power of Big Data The IBM Big Data Platform “, Tata McGraw Hill Publications, 2012.
5. Kafka: The Definitive Guide- Real-Time Data and Stream Processing at Scale, by Gwen Shapira, Neha Narkhede ,Todd Palino