jump to main area
:::
A- A A+

Seminars

Exploring Categorization Information for Efficient Topic Hierarchy Integration

  • 2002-12-09 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. Jyh-Jong Tsay
  • Institute of Computer Science and Information Engineering, National Chung Cheng University

Abstract

In this paper, we study the problem of integrating documents from different sources into a comprehensive topic hierarchy. Our objective is to develop efficient techniques that improve the accuracy of traditional categorization methods by incorporating categorization information provided by data sources into categorization process. Notice that in the World-Wide Web, categorization information is often available from information sources. For example, news from newspapers, books from publishers, items from electronic commercial sites, or even web pages archived by web information portals are categorized. Observe that many of the topic hierarchies adopted by current information sources are highly related. We believe that categorization information can be used to improve classification accuracy. We present several techniques that explore relations between topic hierarchies and incorporate categorization information from source hierarchies into traditional classification methods such as Baysian methods and support vector machines. Experiment on collections from Openfind and Yam, and Google and Yahoo, well-known popular web sites in Taiwan and USA, respectively, shows that incorporating categorization information from source hierarchies can significantly improve the classification accuracy.(This is joint work with Chi-Feng Chang.)

Update:
scroll to top