BD4006 Data Intensive Computing Syllabus:

BD4006 Data Intensive Computing Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand the basics of the various database systems including databases for Big data.
 To learn the architecture of data intensive computing.
 To learn parallel processing for data intensive computing.
 To learn Security in Data Intensive Computing Systems.
 To learn the applications that involve Data intensive computing.

UNIT I INTRODUCTION

Introduction to Distributed systems – Databases Vs. File Systems – Distributed file systems(HDFS) – Distributed Machine-Learning System – Data Parallelism – Characteristics -Hadoop –Execution Engines -Map Reduce- Distributed Storage System for Structured Data – NoSQL databases – Casandra, Mongo DB-Developing a Distributed Application

UNIT II ARCHITECTURES AND SYSTEMS

High performance Network Architectures for Data intensive Computing – Architecting Data Intensive Software systems – ECL/HPCC: A Unified approach to Big Data – Scalable storage for Data Intensive Computing – Computation and Storage of scientific data sets in cloud- Stream Data Model – Architecture for Data Stream Management-Stream Queries –Sampling Data in a Stream Filtering Streams

UNIT III TECHNOLOGIES AND TECHNIQUES

Load balancing techniques for Data Intensive computing – Resource Management for Data Intensive Clouds – SALT – Parallel Processing, Multiprocessors and Virtualization in Data intensive Computing – Challenges in Data Intensive Analysis and Visualization – Large-Scale Data Analytics Using Ensemble Clustering – Ensemble Feature Ranking Methods for Data Intensive Computing Application – Record Linkage Methodology and Applications- Semantic Wrapper

UNIT IV SECURITY

Security in Data Intensive Computing Systems – Data Security and Privacy in Data-Intensive Super computing Clusters – Information Security in Large Scale Distributed Systems -Privacy and Security Requirements of Data Intensive Applications in Clouds

UNIT V APPLICATIONS AND FUTURE TRENDS

Cloud and Grid Computing for Data Intensive Applications -Scientific Applications – Bioinformatics Large Science Discoveries – Climate Change – Environment – Energy – Commercial Applications – Future trends in Data Intensive Computing

TOTAL : 45 PERIODS

COURSE OUTCOMES:

Upon completion of the course, the students will be able to
CO1: Design applications that involve data intensive computing.
CO2: Suggest appropriate architecture for data intensive computing systems.
CO3:Decide on the appropriate techniques of Map Reduce, Mongo DB, for the different Applications.
CO4: Identify parallel processing techniques for data intensive computing.
CO5: Decide on the various security techniques that are necessary for data intensive applications.

REFERENCES:

1. Tom White, “Hadoop: The Definitive Guide”, O’Reilly Media, 4th edition,2015.
2. Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom., “Database Systems: The Complete Book”, Pearson, 2013.
3. Furht, Borko, Escalante, Armando, “Handbook of Data Intensive Computing”, Springer 2011.

WEB REFERENCES:

1. https://en.wikipedia.org/wiki/Data-intensive_computing
2. https://www.computer.org/csdl/magazine/co/2008/04/mco2008040030/13rRUNvgyZ8

ONLINE RESOURCES:

1. https://www.slideshare.net/huda2018/dataintensive-technologies-for-cloudcomputing