jump to main area
:::
A- A A+

Seminars

Statistical Analysis of Pairwise Local Sequence Alignments

  • 2002-09-09 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Professor David O. Siegmund
  • Dept. of Statistics Stanford Univ. USA

Abstract

An important step in learning the function of a new gene (DNA sequence) or protein (amino acid sequence) is to compare the new sequence with existing sequences in a data base search. Evolutionary theory holds that genes/proteins having a similar function are likely to have evolved from a common ancestor through mutation. Hence one hopes that by finding in the data base sequences similar to the new sequence one can make an educated guess about its function. There are three major issues of sequence comparison in a data base search: (i) choice of a scoring method to measure sequence similarity; (ii) algorithmic determination of se-quence similarity; and (iii) statistical significance of sequences showing a particular level of similarity. In this talk I will discuss the history and recent developments of (iii). Assume that two sequences from a finite alphabet are (locally) optimally aligned ac-cording to a scoring system that rewards similarities and penalizes gaps (insertions and deletions). Assume also that the letters in each sequence are independent and identically distributed and the two sequences are independent. For ungapped alignments, Dembo, Karlin, and Zeitouni (1994) have obtained approximate p-values for the optimal local alignment. Current practice is to assume that an approximation of the same parametric form is valid when gaps are allowed and to use numerical methods based on real or sim-ulated data to fit the parameters numerically. After reviewing the current practice, I will discuss (a) recent research on analytic approximations and (b) an importance sampling Monte Carlo method due to Hock Peng Chan, and conclude with numerical comparisons of different methods.

Update:
scroll to top