跳到主要內容區塊
:::
A- A A+

演講公告

:::

Multi-Armed Bandit with Covariates

Abstract

Multi-armed bandit problem is an important optimization game that requires an exploration-exploitation tradeoff to achieve optimal total reward. Motivated from industrial applications such as online advertising and clinical trial adaptive design, we consider a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of the allocation rules. We establish strong consistency of nonparametric methods and derive their rates of convergence. In addition, model selection and combination results are presented as well. The work is joint with Wei Qian.

最後更新日期:
回頁首