Spline Models for the Analysis of Recurrent Event Panel Data

Jason Nielsen successfully defended his Ph.D. thesis entitled "Spline Models for the Analysis of Recurrent Event Panel Data" on 28 May 2007.

There has been a substantial interest in longitudinal studies, particularly for monitoring changes in profiles of specific sub-sectors of society, for example, Statistics Canada's Canadian National Longitudinal Study of Children and Youth (Statistics Canada, 1996), the Women's Health Australia national longitudinal study (Women's Health Australia, 2005) and the U.S. National Longitudinal Surveys on Labor Statistics (National Longitudinal Surveys Handbook, 2005). For these and many longitudinal studies, so-called panel data are collected, with information gathered between specific follow-up times. When interest focuses on multiple or recurrent episodes of an event of interest, recurrent event panel data arise, where only information on the number of recurrences between follow-up times is recorded. Such data collection designs are typical in clinical studies where it is not possible to record exact event times, for example, if examinations are invasive or occur too frequently as in the study of chronic diseases such as epilepsy (Thall and Vail, 1990) or certain incidences of tumors in cancer patients (Abu-Libdeh et al., 1990).

This thesis discusses semiparametric methods for the analysis of recurrent event panel data and offers a comprehensive framework for such analysis requiring only minimal distributional assumptions. The basic model assumes that the counts for each subject are generated by a mixed nonhomogeneous Poisson process (NHPP) where frailties account for heterogeneity common to this type of data. The generating intensity of the counting process is assumed to be a smooth function modeled with splines. Covariate effects are also represented as splines; this permits covariate effects to change over time. The development offers several special limiting cases which are common, for example, a constant intensity, or fixed covariate effects. The thesis also considers discrete mixtures of these mixed (NHPP) models accommodating clusters of hidden sub-populations which generate counts with differing intensity functions. Several recent applications investigated suggested a need for accommodating such unobservable sub-populations. For example, in the motivating application that is used throughout this thesis, moth matings in the summer seem to be generated by emergence of at least two types of moths in the spring: those which overwinter in the pupal stage and emerge earlier in the spring, and those which overwinter in the egg stage; the finite mixture approach accommodates this type of behavior. The thesis concludes with a discussion of several areas for further investigation in this important field of study.