中央研究院統計科學研究所

演講公告

演講公告演講公告

:::

Boosting Data Analytics with Synthetic Volume Expansion

2024-05-20 (Mon.), 10:30 AM
統計所B1演講廳；茶會：上午10：10。
實體與線上視訊同步進行。
Prof. Xiaotong Shen
School of Statistics, University of Minnesota

Abstract

Synthetic data generation heralds a paradigm shift in data science, addressing the challenges of data scarcity and privacy and enabling unprecedented performance. As synthetic data gains prominence, questions arise regarding the accuracy of statistical methods compared to their application on raw data alone. Addressing this, we introduce the Synthetic Data Generation for Analytics framework, which applies statistical methods to high-fidelity synthetic data produced by advanced generative models like tabular diffusion models through knowledge transfer. These models, trained using raw data, are enriched with insights from relevant studies. A significant finding within this framework is the generational effect: the error of a statistical method initially decreases with the integration of synthetic data but may subsequently increase. This phenomenon, rooted in the complexities of replicating raw data distributions, introduces the "reflection point," an optimal threshold of synthetic data defined by specific error metrics. Through one data example, we demonstrate the effectiveness of this framework.

This work is joint with Y. Liu and R. Shen.

線上視訊請點選連結

附件下載

最後更新日期：2024-05-17 11:32

回列表頁