跳到主要內容區塊
:::
A- A A+

演講公告

:::

Entropy Based Statistical Inference for Some HDLSS Genomic Models: UI Tests in a Chen-Stein Perspective

  • 2010-05-24 (Mon.), 10:30 AM
  • 中研院-蔡元培館 2F 208 演講廳
  • 茶 會:上午10:10統計所蔡元培館二樓
  • 蔡 明 田 教授
  • 本所研究員

Abstract

One of the scientific foci is to classify the K genes into two subsets of disease genes and non-disease genes. For HDLSS (high-dimensional, low-sample size) categorical data models, the number of associated parameters increases exponentially with K, thus creating an impasse to adapt conventional discrete multivariate analysis or model selection tools. Faced with this rather awkward environment, often statistical appraisals are based on marginal p-values where the multiple hypothesis testing (MHT) problem can be handled with the original Fisher’s method (developed nearly 80 years ago) along with various ramifications during the past 25 years or so. On the other hand, like the maximum likelihood being the dominant paradigm in statistics, the Shannon entropy (1948) is the dominant paradigm in information and coding theory. For qualitative data models, Gini-Simpson index (Gini, 1912; Simpson, 1949) and Shannon entropy are commonly used in dissimilarity and diversity analysis, economic inequality and poverty analysis, and genetic variation studies, as well as in many other fields. By the Lorenz curve, we can show that Shannon entropy appears to be more informative than Gini-Simpson index. However, for HDLSS genomic models, we suspect that the information might not be fully captured in a pseudo-marginal setup (namely, the so-called multivariate version of Shannon entropy in the literature). To capture greater information, some new genuine multivariate analogues of Shannon entropy are proposed. The nested subset monotonicity prospect along with subgroup decomposability of the proposed new measures is also exploited. Based on the proposed new Hamming-Shannon pooled measures, we incorporate the union-intersection principle of Roy (1953) and Chen-Stein theorem (Chen, 1975) to formulate suitable statistical procedures for gene classification. The SARSCoV data set is appraised as illustration. This is a joint work with Prof. P.K. Sen. ?

最後更新日期:
回頁首