Farouk Nathoo

A Bayesian Group Sparse Multi-Task Regression Model for Imaging Genetics

Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. In this setting, high-dimensional regression for multi-SNP association analysis is challenging as the response variables obtained through brain imaging comprise potentially interlinked endophenotypes, and there is a desire to incorporate a biological group structure among SNPs based on their genetic arrangement. We consider a recently developed approach for the analysis of imaging genetic studies based on penalized regression with regularization based on a group l_{2,1}-norm penalty which encourages sparsity at both the gene and SNP level. While incorporating a number of useful features, a shortcoming of the proposed approach is that it only furnishes a point estimate and techniques for obtaining valid standard errors or interval estimates are not provided. We solve this problem by developing a corresponding Bayesian formulation based on a three-level hierarchical model that allows for full posterior inference using Gibbs sampling. Techniques for the selection of tuning parameters are investigated thoroughly and we make comparisons between cross-validation, fully Bayes, and empirical Bayes approaches for the choice of tuning parameters. Our proposed methodology is investigated using simulation studies and is applied to the analysis of a large dataset collected as part of the Alzheimer's Disease Neuroimaging Initiative. I will discuss how our Gibbs sampler scales with an increasing number of SNPs and imaging phenotypes and also describe extensions of the model for application to brain-wide data and the corresponding development of a spatial model that is currently in progress.