Institute of Statistical Science Academia Sinica

Seminars

Seminars Seminars

Predicting the Positions of Proteins in the Cell through Document Classification Techniques

2009-06-15 (Mon.), 10:30 AM
Auditorium, 2F, Tsai Yuan-Pei Memorial Hall
Prof. Wen-Lian Hsu
Institute of Information Science, Academia Sinica

Abstract

Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years including expert system, k-nearest neighbors, artificial neural networks, support vector machines, and Bayesian networks. In this talk we shall describe PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. Our approach compares favorably with all other approaches and demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy.

Update：2025-07-02 23:16

Back