1174-Derek Qiu

An Applied Analysis of High Dimensional Logistic Regression

In the high dimensional setting, we investigate common regularization approaches for fitting logistic regression models with binary response variables. A literature review is provided on generalized linear models, regularization approaches which include the lasso, ridge, elastic net and relaxed lasso, and recent post-selection methods for obtaining $p$-values of coefficient estimates proposed by Lockhart et. al. and Buhlmann et. al. We consider varying n, p conditions, and assess model performance based on several evaluation metrics - such as their sparsity, accuracy and algorithmic time efficiency. Through a simulation study, we find that Buhlmann et. al's multi sample splitting method performed poorly when selected covariates were highly correlated. When the penalty parameter lambda was chosen through cross validation, the elastic net had similar levels of performance as compared to the lasso, but it did not possess the level of sparsity Zou and Hastie have suggested.