1134-Zhenhua Lin

Some Perspectives of Smooth and Locally Sparse Estimators

In this thesis we develop some new techniques for computing smooth and meanwhile locally sparse (i.e. zero on some sub-regions) estimators of functional principal components (FPCs) in functional principal component analysis (FPCA) and coefficient functions in functional linear regression (FLR). Like sparse models in ordinary data analysis, locally sparse estimators in functional data analysis enjoy less variability and better interpretability.

In the first part of the thesis, we develop smooth and locally sparse estimators of FPCs. For an FPC, the sub-regions on which it has significant magnitude are interpreted as where sample curves have major variations. The non-null sub-regions of our estimated FPCs coincide with the sub-regions where the corresponding FPC has significant magnitude. This makes our derived FPCs easier to interpret: those non-null sub-regions are where sample curves have major variations. An efficient algorithm is designed to compute our estimators using projection deflation. Our estimators are strongly consistent and asymptotically normal under mild conditions. Simulation studies also show that FPCs estimated by our method explain similar variations of sample curves as FPCs estimated from other methods.

In the second part of the thesis, we develop a new regularization technique called “functional SCAD” (fSCAD), which is the functional generalization of the well-known SCAD regularization, and then apply it to derive a smooth and locally sparse estimator of the coefficient function in FLR. The fSCAD enables us to identify the null sub-regions of the coefficient function without over shrinking the non-zero values. The smoothness of our estimator is regularized by a roughness penalty. We also develop an efficient algorithm to compute the estimator in practice via B-Splines expansion. An asymptotic analysis shows that our estimator enjoys the oracle property, i.e. it performs as well as if we knew the true null sub-regions of the coefficient function in advance. The simulation studies show that our estimator has superior numerical performance.