CS182: Introduction to Machine Learning

Course Descriptions

Machine learning (ML) is the science of making computer artifacts improve their performance without requiring humans to program their behavior explicitly. Machine learning has accomplished successes in a wide variety of challenging applications, ranging from computational molecular biology to computer vision to social web analysis.

CS182 Introduction to Machine Learning is an undergraduate-level introductory course in machine learning. It is not only suitable for students who want to apply principled machine learning techniques competently to their application-oriented research areas, but is also suitable for students pursuing or planning to pursue research in machine learning or other related areas that focus on model and algorithm development.

(Developed based on the course materials by Ethem Alpaydin.)

Announcements

  1. Piazza: https://piazza.com/shanghaitech.edu.cn/fall2024/cs182

  2. Gradescope: See the HW's.

Prerequisites

  1. Compulsory: Linear Algebra, Calculus, Probability and Statistics, Programming.

  2. Recommended Postrequisites: Matrix Analysis and Computations, Convex Optimization, Machine Learning.

Textbooks and Optional References

Textbooks

  1. Ethem Alpaydin, Introduction to Machine Learning (4th Edition), The MIT Press, 2020.

References

  1. Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification (2nd Edition), Wiley, 2000.

  2. Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

  3. Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.

  4. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar, Foundations of Machine Learning (2nd Edition), The MIT Press, 2018.

Schedule (Subject to Change)

Topics

  1. Topic 0: Overview

  2. Topic T1: ML Introduction

  3. Topic T2: Mathematical Foundations of ML (Linear Algebra, Probability and Statistics, Optimization Theory, etc.)

  4. Topic P1: “scikit-learn” Introduction and Data Preprocessing

  5. Topic T3: Bayesian Decision Theory

  6. Topic T4: Parameter Estimation for Generative Models

  7. Topic T5: Linear Discrimination Models

  8. Topic P2: Classification and Regression I

  9. Topic T6: Multilayer Perceptrons

  10. Topic T7: Support Vector Machines

  11. Topic P3: Classification and Regression II

  12. Topic T8: Dimensionality Reduction

  13. Topic T9: Clustering and Mixture Models

  14. Topic P4: Dimensionality Reduction and Clustering

  15. Topic T10: Nonparametric Methods

  16. Topic T11: Ensemble Learning

  17. Topic P5: Nonparametric Methods and Ensemble Learning

  18. Topic T13: Model Assessment and Selection

  19. Topic P6: Model Assessment and Selection

  20. Topic T14: Review

Note: All course materials are available on Piazza.com.

Assessment

30% assignments, 40% final exam, 30% final project.

Academic Integrity Policy

Group study and collaboration on problem sets are encouraged, as working together is a great way to understand new materials. Students are free to discuss the homework problems with anyone under the following conditions:

  • Students must write down their own solutions. Plagiarism is never allowed. Similar answers, MATLAB codes, etc., found in HWs will invite you into suspected plagiarism investigation.

  • Students must list the names of their collaborators (i.e., anyone with whom the assignment was discussed).

  • Students can not use old solution sets from other classes under any circumstances, unless the instructor grants special permission.

Students are encouraged to read the ShanghaiTech Policy on Academic Integrity.