中央研究院統計科學研究所

演講公告

演講公告演講公告

:::

Exploration Through Reward Biasing in Bandits

線上視訊舉行 Presented by Webex Meeting

2022-06-06 (Mon.), 10:30 AM
線上視訊
英文演講
Prof. Ping-Chun Hsieh ( 謝秉均教授 )
國立陽明交通大學資訊工程學系

Abstract

Bandit learning is a classic machine learning framework that captures the explore-exploit dilemma in many sequential decision-making problems. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational efficiency. To tackle this, we present a new family of bandit algorithms, that are formulated in a general way based on the reward-biased maximum likelihood estimation (RBMLE) principle. This talk will cover the following three aspects:
(i) Bandits and Regret: I will provide a brief overview of the bandit learning problems and the notion of regret.
(ii) RBMLE: I will introduce the generic RBMLE approach in adaptive control and present how to adapt RBMLE to stochastic bandits. Through theoretical analysis and simulations, we demonstrate that the proposed RBMLE achieves regret performance comparable to the best of state-of-the-art methods while having a significant computational advantage in comparison to other best-performing methods.
(iii) RBMLE for Contextual Bandits: Based on (ii), I will present how to extend RBMLE to contextual bandits, including the Neural Contextual Bandits that leverage the representation power of neural networks in bandit problems.

線上視訊請點選連結

附件下載

最後更新日期：2022-05-30 07:39

回列表頁