Statistical Physics Approach to Information Categorization of Symbolic Sequences
- 2004-07-09 (Fri.), 10:30 AM
- Recreation Hall, 2F, Institute of Statistical Science
- Professor Chung-Kang Peng
- Harvard Medical School, USA
Abstract
We propose a systematic approach to categorize information carried by symbolic sequences based on their usage of repetitive patterns. We proposed a simple formula to quantify the "dis-similarity" between two symbolic sequences. This dis-similarity index comparing two symbolic sequences is closely related to the Shannon entropy and rank order of these repetitive patterns. The physical meaning of this dis-similarity index can be easily understood by applying fundamental statistical physics concepts to dynamical systems. Finally, to illustrate that this generic approach is applicable to a wide range of real-world problems, we apply our algorithm to study literary texts, DNA sequences, and biological time series.