Doug Wiens

Model Robust Scenarios for Active Learning

What we in Statistics call experimental design is very much like what those in Machine Learning call active learning. In both cases, the idea is that predictor variables are chosen in some optimal manner, and at these values a response variable is observed. In design, the regressors are determined by a design measure, obtained by the designer according to some optimality principle such as minimum mean squared error of the predicted values. In 'passive learning' these regressors are randomly sampled from 'the environment', in active learning they are randomly sampled from a subpopulation according to a probability density derived by the designer. So a major difference between active learning and experimental design is in the random, rather than deterministic, sampling of the regressors from the learning density or design measure. When the parametric model being fitted is exactly correct, the corresponding loss functions are asymptotically equivalent and the methods of experimental design apply, with only minor modifications, to active learning. When however this model is in doubt, some significant differences between robust design and robust learning emerge, and with them interesting, new, optimality problems.