Genome-Wide Association Studies (GWAS) have narrowed down the genome into regions underlying complex phenotypes. However, any one region still harbours thousands of correlated genetic variants, complicating biological follow-up. We therefore need variable selection to refine the large set of variants simply associated with a phenotype down to a much smaller set of putative causal variants with direct effect on the phenotype. This is, however, a hard combinatorial problem and requires advanced statistical methods to efficiently explore the high-dimensional model space.
We present the FINEMAP software that couples Bayesian variable selection for fine-mapping causal variants with an ultrafast high-resolution stochastic search. With extensive simulations we show that FINEMAP is as accurate as exhaustive search when the latter can be completed and achieves even higher accuracy when the latter must be constrained due to computational reasons. We further demonstrate that FINEMAP opens up completely new opportunities by fine-mapping the HDL-C association of the LIPC locus with 20,000 variants in less than 90 seconds while exhaustive search would require thousands of years.
GWAS sample sizes soon counted in millions provide unprecedented opportunities for fast and accurate fine-mapping. It would further be useful to routinely evaluate how much of the phenotypic variation can be explained by the fine-mapped variants. Therefore, we compare regional heritability estimation using FINEMAP with both the variance component model BOLT and fixed-effect model HESS in 110 regions across 51 biomarkers on 5,265 Finns. Our results show good concordance among all methods in regions with negligible contribution to the genome-wide heritability, whereas BOLT and HESS yielded respectively larger and smaller estimates than FINEMAP in regions with moderate to high heritability levels. Scaling the analysis for lipid traits from 5,265 Finns to 21,320 Finns shows good agreement between FINEMAP and BOLT also for moderate to high levels of regional heritability, whereas HESS estimates are consistently lower at these levels. Through comprehensive simulations with biobank-scale projects, we illustrate how violations of model assumptions on polygenicity or unspecified genetic architecture induces inaccuracy to the existing heritability estimates that becomes more accentuated as statistical power to identify causal variants increases.