BN4102 Data Management and Data Engineering Syllabus:

BN4102 Data Management and Data Engineering Syllabus – Anna University PG Syllabus Regulation 2021

OBJECTIVES:

➢ To understand the basics of Data Management.
➢ To Understand Big data and technologies behind Big data
➢ To Introduce the concepts of Cloud Computing and Key Data Mining Algorithms

UNIT – I INTRODUCTION TO DATA MANAGEMENT

Database System Concepts – Database Architecture – Data model – Data Warehouse – Data Marts – Data Lake – Batch, Stream, and Micro-batch Processing – Concepts of ETL – SQL – The CAP Theorem – NOSQL Databases

UNIT – II BIG DATA AND TECHNOLOGIES

What is Big Data? – Big Data Technologies Based on MapReduce and Hadoop – Hadoop Distributed File System (HDFS) – YARN– Case Study- Preventing Private Information Inference Attacks on Social Networks-Grand Challenge: Applying Regulatory Science and Big Data to Improve Innovation.

UNIT – III CLOUD COMPUTING

Cloud Computing – Overview of Cloud Platforms – Detailed study of AWS Ecosystem – AWS Analytics Services – AWS Data Movement Services – AWS Predictive Analytics & Machine Learning Services – Amazon Redshift – Amazon EMR – Amazon MSK – Amazon Kinesis – AWS Serverless – AWS Lambda

UNIT – IV DATA ENGINEERING AND GOVERNANCE

Key Data Mining Algorithms – Data Governance Tools – Data Stewardship, Data Quality, Master Data Management (MDM) – Data Security – Statistical Database Security – Flow Control – Encryption and Public Key Infrastructures.

UNIT – V R LANGUAGE

Overview, Programming structures: Control statements -Operators -Functions -Environment and scope issues -Recursion -Replacement functions, R data structures: Vectors -Matrices and arrays -Lists -Data frames -Classes, Input/output, String manipulations.

TOTAL: 45 PERIODS

COURSE OUTCOMES:

➢ Appreciate the significance of Database Management Systems and understand computational software’s and techniques for handling big data in business applications.

REFERENCES:

1. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, Second Edition, 2007.
2. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2014.
3. Gardener, M. (2012). Beginning R: the statistical programming language. John Wiley & Sons.
4. Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. ” O’Reilly Media, Inc.”.
5. Kenneth C. Laudon and Jane P Laudon, Management Information Systems – Managing the Digital Firm, 15 th edition, 2018.
6. Panneerselvam. R, Database Management Systems, 3rd Edition, PHI Learning, 2018.
7. Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, USA, 2011.
8. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
9. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007.