jump to main area
:::
A- A A+

Seminars

Prediction of Solvent Accessibility using Two-Stage Multi-Class SVMs and SVRs

  • 2005-11-07 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Professor Jagath C. Rajapakse
  • School of Computer Engineering, Nanyany Technological Univ., Singapore

Abstract

Information on Relative Solvent Accessibility (RSA) of amino acid residues in proteins provides valuable clues to the prediction of protein structure and function. A two-stage approach with Support Vector Machines (SVMs) is proposed, where an SVM predictor is introduced to the output of the single-stage SVM approach to take into account the contextual relationships among solvent accessibilities for the prediction. By using the position specific scoring matrices, generated by PSI-BLAST, the two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh dataset of 215 protein structures and the RS126 dataset of 126 nonhomologous globular proteins, respectively. We address the problem of predicting Solvent Accessible Surface Area (ASA) of amino acid residues in protein sequences, without classifying them into buried and exposed types. A two-stage support vector regression (SVR) approach is proposed to predict real values of ASA from the position-specific scoring matrices (PSSMs) generated from PSI-BLAST profiles. By adding SVR as the second stage to capture the influences on the ASA value of a residue by those of its neighbors, the two-stage SVR approach achieves improvements of mean absolute errors up to 3.3%, and correlation coefficients of 0.66, 0.68, and 0.67 on the Manesh dataset of 215 proteins, the Barton dataset of 502 nonhomologous proteins, and the Carugo dataset of 338 proteins, respectively, which are better than the earlier published scores.

Update:
scroll to top