跳到主要內容區塊
:::
A- A A+

博士後演講公告

:::

Bayesian Nonparametric Clustering and Association Studies for Large-scale SNP Observations

Abstract

When analyzing an enormous amount of SNPs data without prior biological information to define SNP-sets, clustering analysis is often considered as the first step of analysis. Then, a genetic association analysis is performed based on the results of clustering analysis. Even if a clustering procedure has been done in advance, the impact of its uncertainty on the subsequent association analysis is rarely assessed. In this talk, I will introduce the proposed Bayesian nonparametric clustering method which utilizes Dirichlet process mixture model. Our proposed method has the advantage that the number of clusters dose not need to be known and fixed before starting the clustering procedure, and the uncertainty in the number of clusters is also accounted for. Additionally, with the designed individualized genetic score for each SNP cluster for every subject, the subsequent regression model for association analysis is able to incorporate the information from a large-scale SNP data, and yet with a much smaller number of explanatory variables. The inference of cluster allocation, the strength of association of each SNP cluster, and the susceptibility of each SNP is based on their posterior samples from Markov chain Monte Carlo methods and the Binder loss information. We exemplify this Bayesian nonparametric strategy in a genome-wide association study of Crohn’s disease from WTCCC database.

最後更新日期:
回頁首