In this course we will look at core algorithmic aspects of machine learning. We will cover several classical tools in machine learning but more emphasis will be given to recent advances and developing efficient and provable algorithms for learning tasks. This is a seminar course: after the initial lectures by me, the students will be expected to present a relevant recent paper in class.

Prerequisites: An undergraduate course in algorithms (equivalent of CS180) will be assumed. The students are expected to be mathematically mature and be able to follow and write down formal proofs. Familiarity with probability will be helpful.

Course work: We will have three assignments, one in-class presentation, and one final project. The final project can either be a cohesive literature survey of a specific topic, a research project, or an experimental project investigating different algorithms on a specific learning problem; it can even be in the form of participating in some machine learning competitions. The relative weight of the different items will depend on enrollment numbers.

Resources: There is no required course text. The following links would be useful:
Sanjeev Arora's course.
Elad Hazan's course.
Ankur Moitra's course.
Draft of Foundations of Data Science by Hopcroft and Kannan.
Links to appropriate papers or other online material (typically other lecture notes) will be provided for each lecture.



The following is a tentative list of topics to be covered.
Learning theory: what and how? (2 lectures)
How to model learning?
PAC model
Towards tractable learning models
Linearity: the swiss-army knife (3 lectures)
Best-fit subspaces, low-rank approximations, and Singular Value Decomposition
Applications of SVD
Multiplicative weights (2 lectures)
Online optimization and regret
Applications of multiplicative weights
Optimization: the work-horse of learning (3 lectures)
Learning as optimization
Gradient descent
Stochastic gradient descent
The power of convex relaxations (2 lectures)
Compressed sensing: formulation and analysis
Convexification: matrix completion, sparse PCA
Algorithms for massive data problems (2 lectures)
Hyperloglog
Dimension reduction
Neural networks (2 lectures)
Constant-depth circuits, back propogation, and limitations
The reemergence of neural nets
Student presentations (4 lectures)
Most topics should be eligible