PIMS- UAlberta Statistics Colloquium: Peter Binev
Topic
Learning a Function and Optimal Recovery
Speakers
Details
We consider the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation fˆ to f that predicts the values of f away from the data. There are numerous settings for this learning problem depending on: (i) the model class assumption, i.e., what additional information we have about f; (ii) how we measure the accuracy of how well fˆ predicts f; (iii) what is known about the data and data sites; (iv) whether and how the data observations are polluted by noise. In the presence of a model class assumption, we use the notion of Chebyshev radius of a set to give a mathematical description of the optimal performance possible, i.e., the smallest possible error of recovery. Under standard model class assumptions, we show that a near-optimal fˆ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near-optimal means that the error is bounded by a fixed-constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this work prove that over-parameterized learning with an appropriate loss function gives a near-optimal approximation fˆ of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near-optimal recovery of f. An extension of these results to the case where the data is polluted by additive deterministic noise is also given. This is collaborative research with Andrea Bonito, Ronald DeVore, and Guergana Petrova from Texas A&M. It was supported in part by NSF grant DMS 2038080.