跳到主要內容區塊
:::
A- A A+

演講公告

:::

Exploration Through Reward Biasing in Bandits

線上視訊舉行 Presented by Webex Meeting

Abstract

Bandit learning is a classic machine learning framework that captures the explore-exploit dilemma in many sequential decision-making problems. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational efficiency. To tackle this, we present a new family of bandit algorithms, that are formulated in a general way based on the reward-biased maximum likelihood estimation (RBMLE) principle. This talk will cover the following three aspects:
(i) Bandits and Regret: I will provide a brief overview of the bandit learning problems and the notion of regret.
(ii) RBMLE: I will introduce the generic RBMLE approach in adaptive control and present how to adapt RBMLE to stochastic bandits. Through theoretical analysis and simulations, we demonstrate that the proposed RBMLE achieves regret performance comparable to the best of state-of-the-art methods while having a significant computational advantage in comparison to other best-performing methods.
(iii) RBMLE for Contextual Bandits: Based on (ii), I will present how to extend RBMLE to contextual bandits, including the Neural Contextual Bandits that leverage the representation power of neural networks in bandit problems.
 

線上視訊請點選連結

附件下載

1110606 謝秉均教授.pdf
最後更新日期:2022-05-30 07:39
回頁首