BD4006 Data Intensive Computing Syllabus:
BD4006 Data Intensive Computing Syllabus – Anna University PG Syllabus Regulation 2021
COURSE OBJECTIVES:
To understand the basics of the various database systems including databases for Big data.
To learn the architecture of data intensive computing.
To learn parallel processing for data intensive computing.
To learn Security in Data Intensive Computing Systems.
To learn the applications that involve Data intensive computing.
UNIT I INTRODUCTION
Introduction to Distributed systems – Databases Vs. File Systems – Distributed file systems(HDFS) – Distributed Machine-Learning System – Data Parallelism – Characteristics -Hadoop –Execution Engines -Map Reduce- Distributed Storage System for Structured Data – NoSQL databases – Casandra, Mongo DB-Developing a Distributed Application
UNIT II ARCHITECTURES AND SYSTEMS
High performance Network Architectures for Data intensive Computing – Architecting Data Intensive Software systems – ECL/HPCC: A Unified approach to Big Data – Scalable storage for Data Intensive Computing – Computation and Storage of scientific data sets in cloud- Stream Data Model – Architecture for Data Stream Management-Stream Queries –Sampling Data in a Stream Filtering Streams
UNIT III TECHNOLOGIES AND TECHNIQUES
Load balancing techniques for Data Intensive computing – Resource Management for Data Intensive Clouds – SALT – Parallel Processing, Multiprocessors and Virtualization in Data intensive Computing – Challenges in Data Intensive Analysis and Visualization – Large-Scale Data Analytics Using Ensemble Clustering – Ensemble Feature Ranking Methods for Data Intensive Computing Application – Record Linkage Methodology and Applications- Semantic Wrapper
UNIT IV SECURITY
Security in Data Intensive Computing Systems – Data Security and Privacy in Data-Intensive Super computing Clusters – Information Security in Large Scale Distributed Systems -Privacy and Security Requirements of Data Intensive Applications in Clouds
UNIT V APPLICATIONS AND FUTURE TRENDS
Cloud and Grid Computing for Data Intensive Applications -Scientific Applications – Bioinformatics Large Science Discoveries – Climate Change – Environment – Energy – Commercial Applications – Future trends in Data Intensive Computing
TOTAL : 45 PERIODS
COURSE OUTCOMES:
Upon completion of the course, the students will be able to
CO1: Design applications that involve data intensive computing.
CO2: Suggest appropriate architecture for data intensive computing systems.
CO3:Decide on the appropriate techniques of Map Reduce, Mongo DB, for the different Applications.
CO4: Identify parallel processing techniques for data intensive computing.
CO5: Decide on the various security techniques that are necessary for data intensive applications.
REFERENCES:
1. Tom White, “Hadoop: The Definitive Guide”, O’Reilly Media, 4th edition,2015.
2. Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom., “Database Systems: The Complete Book”, Pearson, 2013.
3. Furht, Borko, Escalante, Armando, “Handbook of Data Intensive Computing”, Springer 2011.
WEB REFERENCES:
1. https://en.wikipedia.org/wiki/Data-intensive_computing
2. https://www.computer.org/csdl/magazine/co/2008/04/mco2008040030/13rRUNvgyZ8
ONLINE RESOURCES:
1. https://www.slideshare.net/huda2018/dataintensive-technologies-for-cloudcomputing