Understanding the Effects of Predictor Variables in Black Box Supervised Learning Models
For many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, support vector regression, etc.) in this regard is their lack of interpretability or transparency. Partial dependence (PD) plots, which are the most popular general approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. Functional ANOVA for correlated inputs can avoid this extrapolation but involves prohibitive computational expense and subjective choice of additive surrogate model to fit to the supervised learning model. We present a new visualization approach that we term accumulated local effects (ALE) plots, which have a number of advantages over existing methods. First, ALE plots do not require unreliable extrapolation with correlated predictors. Second, they are orders of magnitude less computationally expensive than PD plots, and many orders of magnitude less expensive than functional ANOVA. Third, they yield convenient variable importance/sensitivity measures that possess a number of desirable properties for quantifying the impact of each predictor.
Bio: Dan Apley is a Professor of Industrial Engineering & Management Sciences at Northwestern University. His research and teaching interests are at the interface of engineering modeling, statistical analysis, and predictive analytics, with particular emphasis on improving operations of complex manufacturing and other enterprise systems. His work has been supported by numerous industries and government agencies. He received the NSF CAREER award in 2001, the IIE Transactions Best Paper Award in 2003, and the Technometrics Wilcoxon Prize in 2008. He was formerly Editor-in-Chief of the Journal of Quality Technology and is currently Editor-in-Chief of Technometrics. He has also served as Chair of the Quality, Statistics & Reliability Section of INFORMS and Director of the Manufacturing and Design Engineering Program at Northwestern.