Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
- 2015-07-27 (Mon.), 10:30 AM
- Recreation Hall, 2F, Institute of Statistical Science
- Prof. Jung-Ying Tzeng
- Dept. of Statistics, North Carolina State University
Abstract
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often underpowered, as a majority of the causal variants can only be identified along with a few noncausal ones. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the FAlse Negative control Screening (FANS) procedure in this paper. The proposed procedure is computationally efficient and can adapt to the underlying proportion of causal variants. Extensive simulation studies across a plethora of scenarios demonstrate that the FANS is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, FANS has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the FANS have been successfully applied to infer related genes with annotation information.