Institute of Statistical Science Academia Sinica

Seminars

Seminars Seminars

Periodic Step-size Adaptation in Second-order Gradient Descent for Single-pass On-line Learning

2010-01-25 (Mon.), 10:30 AM
Auditorium, 2F, Tsai Yuan-Pei Memorial Hall
Prof. Yuh-Jye Lee
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology

Abstract

It has been established that the second-order stochastic gradient descent (2SGD) method can potentially achieve generalization performance as well as optimum in a single pass through the training examples. However, 2SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive, in particular when the learning tasks involve a very high dimensional feature space. In this talk, we present a new second-order SGD method, called Periodic Step-size Adaptation (PSA). PSA approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian periodically. We tested PSA on large scale sequence labeling tasks using conditional random fields and large scale classification task using linear support vector machines. Experimental results show that single-pass performance of PSA is always very close to empirical optimum.

Update：2025-07-02 22:41

Back