jump to main area
:::
A- A A+

Seminars

Label Space Coding for Multi-label Classification

  • 2015-09-07 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. Hsuan-Tien Lin
  • Department of Computer Science and Information Engineering, National Taiwan University

Abstract

Multiclass classification is an important problem in machine learning. It can be used in a variety of applications, such as organizing documents to different categories automatically. Multi-label classification is an extension of multi-class classification --- the former allows a set of labels to be associated with an instance while the latter allows only one. For instance, a document may belong to both the "politics" and "health" class if it is about the National Health Insurance. Many other similar applications arise in domains like text mining, vision, or bio-informatics. ??? In this talk, we discuss a coding view about the output (label) space of multi-label classification. The view represents each set of possible labels as a (fixed-length) binary string. We discuss the close connection between the binary-string representation and the coding theory. In particular, we demonstrate three novel research directions based on the connection: data compression (source coding), error correction (channel coding), and learnable data compression (conditional source coding). We discuss two algorithms that systematically compresses the label space for more efficient computation, and another algorithm that systematically expands the label space for better performance. ??? The talk comes from some joint works with Farbound Tai (Neural Computation, 2012), Chun-Sung Ferng (ACML, 2011) and Yao-Nan Chen (NIPS, 2012). It is self-contained and assumes only basic background in machine learning and coding theory.

Update:
scroll to top