BN4211 Data Science Tools – II (Laboratory) Syllabus:
BN4211 Data Science Tools – II (Laboratory) Syllabus – Anna University PG Syllabus Regulation 2021
COURSE DESCRIPTION:
This course introduces students to the fundamentals of business analytics with a focus on using Python for data analysis. Students will learn how to collect, clean, analyze, and visualize data to make informed business decisions. Topics include data manipulation, descriptive and inferential statistics, predictive modeling, and data visualization techniques using Python libraries such as Pandas, NumPy, Matplotlib, and Scikit-learn.
OBJECTIVES:
➢ To Understand the role of business analytics in decision-making processes.
➢ To Learn how to collect, clean, and manipulate data using Python.
➢ To Apply descriptive and inferential statistical techniques to analyze data.
➢ To Build predictive models using machine learning algorithms.
➢ To Create data visualizations to effectively communicate insights using Python libraries.
UNIT – I
Introduction to Business Analytics and Python, Overview of business analytics and its applications. Introduction to Python for data analysis. Setting up Python environment (Anaconda, Jupyter Notebooks). Data Manipulation with Pandas : Introduction to Pandas library for data manipulation., Working with Series and Data Frames., Data cleaning and preprocessing techniques.
UNIT – II
Descriptive Statistics with NumPy , Introduction to NumPy library for numerical computing. Calculating descriptive statistics (mean, median, variance, etc.). Exploring data distributions. Data Visualization with Matplotlib, Introduction to Matplotlib library for data visualization. Creating line plots, scatter plots, histograms, and bar charts. Customizing plot aesthetics and adding annotations.
UNIT – III
Inferential Statistics :- Hypothesis testing with Python (t-tests, chi-square tests). Confidence intervals and hypothesis testing for proportions. Introduction to ANOVA for comparing means across groups. Predictive Modeling with Scikit-learn, Introduction to machine learning with Scikit-learn. ,Building and evaluating predictive models (linear regression, logistic regression). Model selection and hyper parameter tuning.
UNIT – IV
Advanced Predictive Modeling, Introduction to decision trees and ensemble methods (Random Forest, Gradient Boosting). Evaluating model performance (cross – validation, ROC curves, AUC)., Introduction to feature engineering and selection.
UNIT – V
Time Series Analysis – Introduction to time series data., Exploratory data analysis for time series. Building time series forecasting models (ARIMA, Exponential Smoothing). Real-world case studies applying business analytics techniques with Python.
LIST OF EXERCISES:
Data Manipulation with Pandas:
Load a dataset into a Pandas Data Frame and inspect its structure.
Perform basic data manipulation tasks such as selecting columns, filtering rows, and sorting data.
Handle missing values by imputing or removing them from the dataset.
Descriptive Statistics with NumPy:
Calculate descriptive statistics (mean, median, mode, variance, standard deviation) for numerical variables using NumPy.
Explore data distributions and visualize them using histograms or density plots.
Data Visualization with Matplotlib:
Create basic line plots, scatter plots, and bar charts to visualize relationships between variables.
Customize plot aesthetics such as colors, labels, and titles.
Generate subplots and combine multiple plots into a single figure.
Inferential Statistics with SciPy:
Conduct hypothesis testing (t-tests, chi-square tests) to make inferences about population parameters.
Calculate confidence intervals to estimate the range of plausible values for a population parameter.
Perform correlation analysis to explore relationships between variables.
Predictive Modeling with Scikit-learn:
Split the dataset into training and testing sets for model evaluation.
Build and evaluate predictive models using linear regression, logistic regression, and decision trees.
Apply cross-validation techniques to assess model performance and generalization.
Feature Engineering and Selection:
Create new features by transforming existing variables (e.g., polynomial features, logarithmic transformations).
Select relevant features using techniques such as correlation analysis, feature importance, or recursive feature elimination.
Time Series Analysis:
Convert a dataset into a time series format and visualize temporal patterns.
Apply time series decomposition to separate trend, seasonality, and noise components.
Build and evaluate time series forecasting models (e.g., ARIMA, Exponential Smoothing).
Clustering Analysis with Scikit-learn:
Explore unsupervised learning techniques such as K-means clustering to identify natural groupings in the data.
Visualize clustering results using scatter plots or heat maps.
Evaluate clustering performance using metrics such as silhouette score or Davies – Bouldin index.
TOTAL: 60 PERIODS
COURSE OUTCOMES:
➢ Basic understanding of Python programming language and fundamental statistical concepts.
REFERENCES:
1. Allen B. Downey, “Think Python: How to Think like a Computer Scientist”, 2nd Edition,
O’Reilly Publishers, 2016.
2. Karl Beecher, “Computational Thinking: A Beginner’s Guide to Problem Solving and Programming”, 1st Edition, BCS Learning & Development Limited, 2017.
3. Paul Deitel and Harvey Deitel, “Python for Programmers”, Pearson Education, 1st Edition, 2021.
4. G.Venkatesh and Madhavan Mukund, “Computational Thinking: A Primer for Programmers and Data Scientists”, 1st Edition, Notion Press, 2021.
5. John V Guttag, “Introduction to Computation and Programming Using Python: With Applications to Computational Modeling and Understanding Data”, Third Edition, MIT Press,2021.
6. Eric Matthes, “Python Crash Course, A Hands – on Project Based Introduction to Programming”, 2nd Edition, No Starch Press, 2019.