CP4252 Machine Learning Syllabus:

CP4252 Machine Learning Syllabus – Anna University PG Syllabus Regulation 2021

COURSE OBJECTIVES:

 To understand the concepts and mathematical foundations of machine learning and types of problems tackled by machine learning
 To explore the different supervised learning techniques including ensemble methods
 To learn different aspects of unsupervised learning and reinforcement learning
 To learn the role of probabilistic methods for machine learning
 To understand the basic concepts of neural networks and deep learning

UNIT I INTRODUCTION AND MATHEMATICAL FOUNDATIONS

What is Machine Learning? Need –History – Definitions – Applications – Advantages, Disadvantages & Challenges -Types of Machine Learning Problems – Mathematical Foundations – Linear Algebra & Analytical Geometry -Probability and Statistics- Bayesian Conditional Probability -Vector Calculus & Optimization – Decision Theory – Information theory

UNIT II SUPERVISED LEARNING

Introduction-Discriminative and Generative Models -Linear Regression - Least Squares -Under-fitting / Overfitting -Cross-Validation – Lasso Regression- Classification - Logistic Regression- Gradient Linear Models -Support Vector Machines –Kernel Methods -Instance based Methods – K-Nearest Neighbours – Tree based Methods –Decision Trees –ID3 – CART – Ensemble Methods –Random Forest – Evaluation of Classification Algorithms

UNIT III UNSUPERVISED LEARNING AND REINFORCEMENT LEARNING

Introduction – Clustering Algorithms -K – Means – Hierarchical Clustering – Cluster Validity – Dimensionality Reduction –Principal Component Analysis – Recommendation Systems – EM algorithm. Reinforcement Learning – Elements -Model based Learning – Temporal Difference Learning

UNIT IV PROBABILISTIC METHODS FOR LEARNING

Introduction -Naïve Bayes Algorithm -Maximum Likelihood -Maximum Apriori -Bayesian Belief Networks -Probabilistic Modelling of Problems -Inference in Bayesian Belief Networks – Probability Density Estimation – Sequence Models – Markov Models – Hidden Markov Models

UNIT V NEURAL NETWORKS AND DEEP LEARNING

Neural Networks – Biological Motivation- Perceptron – Multi-layer Perceptron – Feed Forward Network – Back Propagation-Activation and Loss Functions- Limitations of Machine Learning – Deep Learning– Convolution Neural Networks – Recurrent Neural Networks – Use cases

TOTAL: 45 PERIODS

SUGGESTED ACTIVITIES:

1. Give an example from our daily life for each type of machine learning problem
2. Study at least 3 Tools available for Machine Learning and discuss pros & cons of each
3. Take an example of a classification problem. Draw different decision trees for the example and explain the pros and cons of each decision variable at each level of the tree
4. Outline 10 machine learning applications in healthcare
5. Give 5 examples where sequential models are suitable.
6. Give at least 5 recent applications of CNN

PRACTICAL EXERCISES: 30 PERIODS

1. Implement a Linear Regression with a Real Dataset (https://www.kaggle.com/harrywang/housing). Experiment with different features in building a model. Tune the model’s hyperparameters.
2. Implement a binary classification model. That is, answers a binary question such as “Are houses in this neighborhood above a certain price?”(use data from exercise 1). Modify the classification threshold and determine how that modification influences the model. Experiment with different classification metrics to determine your model’s effectiveness.
3. Classification with Nearest Neighbours. In this question, you will use the scikit-learn’s KNN classifer to classify real vs. fake news headlines. The aim of this question is for you to read the scikitlearn API and get comfortable with training/validation splits. Use California Housing Dataset
4. In this exercise, you’ll experiment with validation sets and test sets using the dataset. Split a training set into a smaller training set and a validation set. Analyze deltas between training set and validation set results. Test the trained model with a test set to determine whether your trained model is overfitting. Detect and fix a common training problem.
5. Implement the k-means algorithm using https://archive.ics.uci.edu/ml/datasets/Codon+usage dataset
6. Implement the Naïve Bayes Classifier using https://archive.ics.uci.edu/ml/datasets/Gait+Classification dataset
7. Project – (in Pairs) Your project must implement one or more machine learning algorithms and apply them to some data.
a. Your project may be a comparison of several existing algorithms, or it may propose a new algorithm in which case you still must compare it to at least one other approach.
b. You can either pick a project of your own design, or you can choose from the set of predefined projects.
c. You are free to use any third-party ideas or code that you wish as long as it is publicly available.
d. You must properly provide references to any work that is not your own in the write-up.
e. Project proposal You must turn in a brief project proposal. Your project proposal should describe the idea behind your project. You should also briefly describe software you will need to write, and papers (2-3) you plan to read.

List of Projects (datasets available)

1. Sentiment Analysis of Product Reviews
2. Stock Prediction
3. Sales Forecasting
4. Music Recommendation
5. Handwriting Digit Classification
6. Fake News Detection
7. Sports Prediction
8. Object Detection
9. Disease Prediction

COURSE OUTCOMES:

Upon the completion of course, students will be able to
CO1: Understand and outline problems for each type of machine learning
CO2: Design a Decision tree and Random forest for an application
CO3: Implement Probabilistic Discriminative and Generative algorithms for an application and analyze the results.
CO4: Use a tool to implement typical Clustering algorithms for different types of applications.
CO5: Design and implement an HMM for a Sequence Model type of application and identify applications suitable for different types of Machine Learning with suitable justification.

TOTAL:75 PERIODS

REFERENCES

1. Stephen Marsland, “Machine Learning: An Algorithmic Perspective”, Chapman & Hall/CRC, 2nd Edition, 2014.
2. Kevin Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012
3. Ethem Alpaydin, “Introduction to Machine Learning”, Third Edition, Adaptive Computation and Machine Learning Series, MIT Press, 2014
4. Tom M Mitchell, “Machine Learning”, McGraw Hill Education, 2013.
5. Peter Flach, “Machine Learning: The Art and Science of Algorithms that Make Sense of Data”, First Edition, Cambridge University Press, 2012.
6. Shai Shalev-Shwartz and Shai Ben-David, “Understanding Machine Learning: From Theory to Algorithms”, Cambridge University Press, 2015
7. Christopher Bishop, “Pattern Recognition and Machine Learning”, Springer, 2007.
8. Hal Daumé III, “A Course in Machine Learning”, 2017 (freely available online)
9. Trevor Hastie, Robert Tibshirani, Jerome Friedman, “The Elements of Statistical Learning”, Springer, 2009 (freely available online)
10. Aurélien Géron , Hands-On Machine Learning with Scikit-Learn and Tensor Flow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition, o’reilly, (2017)