A Comparison of Two Logistic Regression Approaches for Case-Control Data with Missing Haplotypes

Mercedeh Ghadessi successfully defended her M.Sc. project entitled "A Comparison of Two Logistic Regression Approaches for Case-Control Data with Missing Haplotypes" on 2 August 2005.

What is the best method to detect associations between genetic risk factors and diseases?

In cohort studies, subjects are followed prospectively through time to determine their disease status after measuring their risk factors at the beginning of the study. These studies can be impractical for rare diseases, however, due to the limited number of subjects who develop the disease. Case-control studies are a cost-effective alternative in which subjects are selected according to disease status and their risk factors are determined retrospectively. When risk factors are fully observed for all subjects in a case-control study, disease associations with these risk factors may be correctly inferred by naively applying statistical methods developed for cohort data.

We investigate the statistical properties of prospective maximum-likelihood (PML), a statistical method developed for cohort data, when it is applied to data from case-control studies in which genetic risk factors known as haplotypes are ambiguous or partially observed in some subjects. Haplotypes are combinations of genetic variation along a stretch of DNA which cannot be easily determined experimentally. We motivate applying PML to case-control data and use simulation to compare the statistical properties of PML to another method based on estimating equations (EE) that is developed specifically for case-control studies.

The conclusion was that PML performed well in the simulation configurations considered. By contrast, EE gave anticonservative inference (i.e. risk estimates biased upwards and measures of precision biased downwards) when there was marked haplotype ambiguity. This has implication for studies that use EE since some haplotypes detected with this method may be "false positives" rather than truly associated with the disease.

This type of interdisciplinary work is a hallmark of our program in Applied Statistics at Simon Fraser University. For more information, please contact Mercedeh Ghadessi (mghadess@sfu.ca) or her supervisors Brad McNeney (mcneney@stat.sfu.ca) and Jinko Graham (jgraham@stat.sfu.ca), Department of Statistics and Actuarial Science.

2 August 2005.