Dongdong Li

Statistical Inference Using Large Administrative Data on Multiple Event Times, with Application to Cancer Survivorship Research

Motivated by the breast cancer survivorship research program at BC Cancer Agency, this thesis develops statistical approaches to analyzing right-censored multivariate event time data.

We begin with estimation of the joint survivor function of multiple event times when the observations are subject to informative censoring due to a terminating event. We formulate the potential dependence of the multiple event times with the time to the terminating event by the Archimedean copulas. This may account for the informative censoring and, at the same time, allow to adapt the commonly used two-step procedure for estimating the joint distribution of the multiple event times under a copula model. We propose an easy-toimplement pseudo-likelihood based estimation procedure under the model, which reduces computational intensity compared to its MLE counterpart.

We then propose a more flexible approach to handle informative censoring with particular attention to observations on bivariate event time potentially censored by a terminating event. We formulate the correlation of the bivariate event time with the censoring time by embedding the bivariate event time distribution in a bivariate copula model. This yields the convenience of inference under the conventional copula model. At the same time, the proposed model is more flexible, and thus potentially more appropriate in many practical situations than modeling the event times and the associated censoring time jointly by a single multivariate copula. Adapting the commonly used two-stage estimation procedure under a copula model, we develop an easy-to-implement estimator for the joint survivor function of the two event times. A by-product of the proposed approaches is an estimator for the marginal distribution of a single event time with semicompeting-risks data.

Further, we extend the approach to regression settings to explore covariate effects in either parametric or nonparametric forms. In particular, adjusting for some covariates, we compare two populations based on an event time with observations subject to informative censoring.

We conduct both asymptotic and simulation studies to examine the consistency, efficiency, and robustness of the proposed approaches. The breast cancer program that motivated this research is employed to illustrate the methodological development throughout the thesis.

Keywords: Copula model; Efficiency and robustness; Informative censoring; Marginal distribution; Multivariate event times; Pseudolikelihood estimation; Variance estimation