Complexity in Simple Cross-Sectional Data with Binary Disease Outcome
Cross-sectionally sampled data with binary disease outcome are commonly collected and analyzed in observational studies for understanding how covariates correlate with disease occurrence. This talk will address two questions: (1) Which risk can be identified in a commonly adopted model (such as the logistic model)? (2) Are there problems when interpreting the identifiable risk? As the progression of a disease typically involves both disease status and duration, this paper considers how the binary disease outcome is connected to the progression of disease through the birth-illness-death process. In general, we conclude that the distribution of cross-sectional binary outcome could be very different from the population risk distribution. The cross-sectional risk probability is determined jointly by the population risk probability together with the ratio of duration of diseased state to the duration of disease-free state. Using the logistic model as an illustrating example, we examine the bias from cross-sectional data and argue that the bias can almost never be avoided. We present an approach which treats the binary outcome as a specific type of current status data and offers a compromised model on the basis of an age-specific risk probability (ARP), though the interpretation of the ARP itself could also be questioned. An analysis based on Alzheimer's disease data is presented to illustrate the ARP approach and data complexity.