BN4311 R Programming (Laboratory) Syllabus:

BN4311 R Programming (Laboratory) Syllabus – Anna University PG Syllabus Regulation 2021

OBJECTIVE:

➢ The course aims to train students in R programming language and its applications in the business world as well as to provide hands-on-training to use various tools and packages of R for advanced data analytics with real and simulated datasets to analyze and solve real and complex analytics problems including data visualization and machine learning.

UNIT – I INTRODUCTION TO R & R ENVIRONMENT AND EXPLORATORY DATA ANALYSIS

Overview of R Language, Installation of R and RStudio, Scripts, Data Types in R, Data Structure in R, Loading Packages, Operators and functions in R, Data Extraction and Wrangling, Exporting Data from R. Pre-processing of data, Exploratory Data Analysis.

UNIT – II DATA VISUALIZATION FOR INSIGHTS USING R

Perceptual mapping through Advanced R packages: ggplot2, Lattice, high charter, R Color Brewer, Plotly, etc. Charts, Graphs, and Maps.

UNIT – III INFERENTIAL STATISTICS

Testing assumptions, Parametric and non-parametric tests, Correlation, Regression: Linear & Logistic, Dimensionality Reduction techniques: EFA & PCA, Multidimensional Scaling, ANOVA, Time Series Analysis: Stationarity AR, MA, ARMA and ARIMA, Forecasting

UNIT – IV CLUSTER ANALYSIS AND CLASSIFICATION

Introduction to Cluster Analysis, Clustering models and Analysis, Hierarchical Clustering, Non-Hierarchical Clustering, K means Clustering, C means Clustering, KNN Classification, Decision Tree and Random Forests,

UNIT – V DATA MINING AND MACHINE LEARNING USING R

Text Mining, Text Mining Algorithms, Sentiment Analysis, Supervised and Unsupervised Machine Learning Algorithms, R-packages for Machine Learning: caret, e1071, xg boost, random Forest, data table.

Practical Exercises:

The learners are required to:
1. Conduct an exploratory study on real data..
2. Apply R and obtain the results from a data set regarding data visualisation.
3. Evaluate the survey results of a pilot study related to primary data.
4. Analyze the results related to the Decision Tree by taking primary data.
5. Collect a stock market data set and apply data mining tools.

List of exercises suitable for a Business Analytics course using R programming:

DATA IMPORT AND CLEANING:
Import a dataset from a CSV file into R using read.csv() or other appropriate functions.
Identify and handle missing values, outliers, and duplicates in the dataset.
Convert data types and ensure consistency in variable naming and formatting.

DATA MANIPULATION WITH DPLYR:
Use dplyr functions (filter(), select(), mutate(), arrange(), group_by(), summarize()) to manipulate and summarize data.
Chain multiple dplyr functions together using the pipe operator (%>%).

DESCRIPTIVE STATISTICS:
Calculate summary statistics (mean, median, standard deviation, etc.) for numerical variables.
Generate frequency tables and histograms for categorical variables.
Explore relationships between variables using correlation analysis.

INFERENTIAL STATISTICS:
Conduct hypothesis testing (t-tests, chi-square tests, ANOVA) to make inferences about population parameters.
Calculate confidence intervals for population means and proportions.
Perform regression analysis to examine relationships between variables.

PREDICTIVE MODELING WITH CARET:
Split the dataset into training and testing sets for model evaluation.
Build predictive models using machine learning algorithms (e.g., linear regression, logistic regression, decision trees, random forests) with the caret package.
Evaluate model performance using metrics such as accuracy, precision, recall, and ROC curves.

TIME SERIES ANALYSIS:
Import time series data into R and convert it into a time series object.
Explore temporal patterns and trends using time series plots and decomposition techniques.
Build time series forecasting models (e.g., ARIMA, exponential smoothing) and assess forecast accuracy.

DATA VISUALIZATION WITH GGPLOT2:
Create various types of plots (scatter plots, line plots, bar plots, box plots) using ggplot2.
Customize plot aesthetics (titles, labels, colors, themes) and add annotations.
Generate faceted plots and combine multiple plots into a single visualization.

INTERACTIVE DATA VISUALIZATION WITH SHINY:
Develop interactive web applications for data visualization using the Shiny package.
Create reactive components (input controls, output plots) and define server logic to update visualizations dynamically.

TEXT MINING AND SENTIMENT ANALYSIS:
Preprocess text data by tokenizing, stemming, and removing stop words.
Perform sentiment analysis to assess the sentiment polarity of textual content.
Visualize sentiment scores using word clouds, bar plots, or sentiment heat maps.

CUSTOMER SEGMENTATION AND MARKET BASKET ANALYSIS:
Use clustering algorithms (e.g., K-means clustering) to segment customers based on demographic or behavioral attributes.
Perform market basket analysis to identify frequently co-occurring products and association rules.

TOTAL : 60 PERIODS

COURSE OUTCOMES:

After completion of the course, learners will be able to:
➢ Learn R Programming language and data wrangling in R,
➢ Visualize the Business Data using R for key insights,
➢ Analyze statistical models and estimate future prospects for Business,
➢ Leverage data mining techniques using R to solve real life problems,
➢ Apply machine learning techniques to solve Business Analytics Problems.

REFERENCES:

1. Gardener, M. (2012). Beginning R: the statistical programming language. John Wiley & Sons.
2. Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. ” O’Reilly Media, Inc.”.
3. Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R (2012). Great Britain: Sage Publications, Ltd, 958.
4. Kumar, M. (2022). Business Analytics using R. Excellence Brings Success
5. Cornillon, P. A., Guyader, A., Husson, F., Jegou, N., Josse, J., Kloareg, M., … & Rouvière, L. (2012). R for Statistics. CRC press.
6. Eric Pimpler, “Data Visualization and Exploration with R: A practical guide to using R, R Studio, and Tidyverse for data visualization, exploration, and data science applications”, Amazon Asia Pacific Holdings Private Limited, 2017.
7. Peter Dalgaard. Introductory Statistics with R (Paperback) 1st Edition Springer-Verlag New York, Inc. ISBN 0-387-95475-9