CORDS SFU Operations Research Seminar: Sharan Vaswani
Topic
Exploiting Problem Structure for Efficient Optimization in Machine Learning
Speakers
Details
Stochastic gradient descent (SGD) is the standard optimization method for training machine learning (ML) models. SGD requires a step-size that depends on unknown problem-dependent quantities, and the choice of this step-size heavily influences the algorithm's practical performance. By exploiting the interpolation property satisfied by over-parameterized ML models, we design a stochastic line-search procedure that can automatically set the SGD step-size. The resulting algorithm exhibits improved theoretical and empirical convergence, without requiring the knowledge of any problem-dependent constants. Next, we consider efficient optimization for imitation learning (IL) and reinforcement learning. These settings involve optimizing functions for which it is expensive to compute the gradient. We propose an optimization framework that uses the expensive gradient computation to construct surrogate functions that can then be minimized efficiently. This allows for multiple model updates, thus amortizing the cost of the gradient computation. The resulting majorization-minimization algorithm is equipped with strong theoretical guarantees and exhibits fast convergence on standard IL problems.