MC4025 Big Data Analytics Syllabus:
MC4025 Big Data Analytics Syllabus – Anna University PG Syllabus Regulation 2021
COURSE OBJECTIVES:
To understand fundamentals of BigData and Hadoop
To learn about file system configuration in HADOOP
To learn Map Reduce concept of Hadoop in executing Task
To learn the Queue Processing and stream processing of Data
To learn about Hadoop Frameworks
UNIT I INTRODUCTION TO BIG DATA AND HADOOP
Types of Digital Data – Introduction to Big Data – Challenges of conventional systems – Web data – Evolution of Analytic scalability – Analytic Processes and Tools – Analysis vs Reporting -History of Hadoop – Apache Hadoop – Analyzing Data with Hadoop – Hadoop Streaming
Lab Components:
Perform setting up and Installing Hadoop
UNIT II HDFS & HADOOP I/O
Hadoop Distributed File System :The Design of HDFS- HDFS Concepts- The Command-Line Interface- Hadoop File Systems- Data Flow- Parallel Copying with distcp- Hadoop Archives Hadoop I/O: Data Integrity- Compression- Serialization
Lab Components:
Implement HDFS Command Reference
Listing contents of directory, Displaying and printing disk usage, Moving files & directories ,Copying files and directories
Implement the following file management tasks in Hadoop: Writing a file into HDFS
Reading data from HDFS, Retrieving files , Deleting files
UNIT III MAPREDUCE
Analyzing the Data with Hadoop- Hadoop Pipes- MapReduce Types – Input Formats- Output Formats- MapReduce Features – MapReduce Works – Anatomy of a MapReduce Job Run – Failures – Job Scheduling – Shuffle and Sort – Task Execution
Lab Components:
Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
Implement Matrix vector multiplication map reduce program
UNIT IV QUEUEING AND STREAM PROCESSING SYSTEMS
Queueing: Queueing systems, Introduction to kafka, producer consumer, brokers, types of queues – single consumer, multi consumer queue servers.
Streaming systems: Stream processing – queues and workers – micro batch streaming processing – introduction to kafka streaming processing API
Lab Components:
Implement Single consumer queue in Kafka
Implement video streaming with producer consumer in Kafka
UNIT V HADOOP FRAMEWORKS
Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators.
Hive : Hive Shell, Hive Services, Hive Metastore, Comparison with Traditional Databases, HiveQL, Tables, Querying Data
Lab Components:
Install and Run Pig then write Pig Latin scripts to sort, group, join your data.
Write a Pig Latin scripts for finding TF-IDF value for book dataset (A corpus of eBooks available at: Project Gutenberg)
Install and Run Hive then use Hive to create, alter, and drop databases, tables
COURSE OUTCOMES:
CO1: Able to apply Hadoop for analyzing Big Volume of Data
CO2: Able to access ,store , do operations on data as Files and directories
CO3; Able to implement MapReduce Concept in analyzing BigData
CO4: Able to implement event streaming using Kafka API
CO5: Able to access volume of data with Hadoop Framework
TOTAL: 75 PERIODS
REFERENCES
1. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with advanced analytics, John Wiley & sons, 2012.
2. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007
3. Tom White, Hadoop: The Definitive Guide, O’Reilly, 2009
4. Paul Zikopoulos ,Dirk DeRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles , David Corigan , “Harness the Power of Big Data The IBM Big Data Platform “, Tata McGraw Hill Publications, 2012.
5. Kafka: The Definitive Guide- Real-Time Data and Stream Processing at Scale, by Gwen Shapira, Neha Narkhede ,Todd Palino