Label Space Coding for Multi-label Classification
- 2015-09-07 (Mon.), 10:30 AM
- 中研院-統計所 2F 交誼廳
- 茶 會:上午10:10統計所二樓交誼廳
- Prof. Hsuan-Tien Lin(林軒田 教授)
- 國立臺灣大學資訊工程學系
Abstract
Multiclass classification is an important problem in machine learning. It can be used in a variety of applications, such as organizing documents to different categories automatically. Multi-label classification is an extension of multi-class classification --- the former allows a set of labels to be associated with an instance while the latter allows only one. For instance, a document may belong to both the "politics" and "health" class if it is about the National Health Insurance. Many other similar applications arise in domains like text mining, vision, or bio-informatics. ??? In this talk, we discuss a coding view about the output (label) space of multi-label classification. The view represents each set of possible labels as a (fixed-length) binary string. We discuss the close connection between the binary-string representation and the coding theory. In particular, we demonstrate three novel research directions based on the connection: data compression (source coding), error correction (channel coding), and learnable data compression (conditional source coding). We discuss two algorithms that systematically compresses the label space for more efficient computation, and another algorithm that systematically expands the label space for better performance. ??? The talk comes from some joint works with Farbound Tai (Neural Computation, 2012), Chun-Sung Ferng (ACML, 2011) and Yao-Nan Chen (NIPS, 2012). It is self-contained and assumes only basic background in machine learning and coding theory.