With the rapidly increasing amount of omics data, modeling for information integration has brought new statistical challenges. In this talk, I will cover three recent studies on different aspects of omics data integration. In the first paper, we have developed a Meta Sparse Kmeans approach to combine multiple transcriptomic studies for disease subtype discovery. In the second study, we turned to combine multi-level omics data using an overlapping group lasso technique for disease subtyping.
Finally, to incorporate a prior biological knowledge of multi-layer overlapping group structure (e.g. multi-omics features => genes => pathways) in biomarker detection, we have proposed a Bayesian hierarchical indicator model that can conveniently incorporate the multi-layer overlapping group structure in variable selection. We discuss properties of the proposed prior and prove selection consistency and asymptotic normality of the posterior median estimator of the method. We apply the model to two simulations and one TCGA breast cancer example to demonstrate its superiority over other existing methods. The results not only enhance prediction accuracy but also improve variable selection and model interpretation.