Computational protein function prediction
- 2020-06-01 (Mon.), 10:30 AM
- Conference Hall 1004, Research Center for Environmental Changes Building
- Prof. Jia-Ming Chang
- Department of Computer Science, National Chengchi University
Abstract
Biological data has grown explosively with the advance of next-generation sequencing. However, annotating protein function with wet lab experiments is time-consuming. Fortunately, computational function prediction can help wet labs formulate biological hypotheses and prioritize experiments. We have developed, GODoc, a novel and effective strategy to incorporate a training procedure into the k-nearest neighbor algorithm (instance-based learning) which is capable of solving the Gene Ontology (GO) multiple-label prediction problem, which is especially notable given the thousands of GO terms. In the CAFA3 competition (68 teams), GODoc ranks 10th in Cellular Component Ontology. In the term-centric task, GODoc performs third and is tied for first for the biofilm formation of Pseudomonas aeruginosa and the long-term memory of Drosophila melanogaster, respectively. Besides GO prediction, we present PSLCNN, a model using deep neural networks to predict protein subcellular localization for eukaryotes and prokaryotes. Compared with the state-of-the-art tools, PSLCNN achieves the best performance for prokaryotes and is comparable for eukaryotes.?ReferencesGODoc2019 Genome biology The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens team: NCCUCS2019 BMC bioinformatics GODoc: A High-Throughput Protein Function Prediction using the Novel k-nearest-neighbor and Voting algorithmsPSLCNN2019 TAAI PSLCNN: Protein Subcellular Localization Prediction for Eukaryotes and Prokaryotes Using Deep Learning2013 PLoS one?Efficient and interpretable prediction of protein functional classes by Correspondence Analysis and Compact Set Relations 2008 Proteins PSLDoc: Protein subcellular localization prediction based on gapped?dipeptides and probabilistic latent semantic analysis