1121- Kyle Vincent

Strategies for Estimating the Size and Distribution of Hard-to-Reach Populations with Adaptive Sampling

This thesis develops new methods for estimating the size and distribution of hard-to-reach populations when employing an adaptive sampling design. Hard-to-reach populations, like those comprised of injection drug-users, are usually not covered by a sampling frame. Hence, the sampler may desire to exploit the social links between its members to adaptively sample individuals for the study. We have developed three novel procedures based on various adaptive sampling designs for estimating the population unknowns.

The first project introduces a complex graph model that accounts for the erratic clustering behavior commonly seen in hard-to-reach populations through observed covariate information. Our novel approach bases inference for the population size and model parameters on a Bayesian data augmentation routine.

The second project explores a new design-based approach that is based on a multi-sample study. Preliminary estimates of population unknowns are based on the initial random selections made for each sample. The adaptively selected members of the sample are included in the inference procedure through Rao-Blackwellization of the preliminary estimator based on sample reorderings which are consistent with a sufficient statistic.

The third project extends on the design-based approach to inference that was introduced by Frank and Snijders (1994) where inference is based on the links originating from members selected for a Bernoulli sample. We propose new estimators of the population size that are based on one wave selected after the initial sample is obtained. We also introduce a Rao-Blackwellization procedure that is similar to that found in the second project for obtaining improved estimates.

The fourth project offers new methods for estimating the Rao-Blackwellized estimates obtained with a design-based approach to inference. We introduce a method termed improved importance sampling, which is based on a sufficient statistic, that reduces the (re)sample space and therefore results in more efficient estimates.

For our thesis study population we use a networked population that was simulated from the complex graph model. We conduct a series of simulation studies based on several different adaptive sampling designs to evaluate the performance of the estimators from each of the projects.

Keywords: Adaptive sampling, Bayesian inference, Capture-recapture, Markov chain Monte Carlo, Network sampling, Rao-Blackwellization