Probing and Enhancing Generalization in Deep Neural Networks
- 2026-03-30 (Mon.), 10:30 AM
- 統計所B1演講廳;茶 會:上午10:10。
- 實體與線上視訊同步進行。
- Prof. Keng-Te Liao ( 廖耿德 助理教授 )
- 國立清華大學統計所
Abstract
In this talk, I will share our recent progress in deep learning with a focus on the generalization capabilities of neural networks (NNs). This talk is organized into three parts. First, I will discuss invariant learning, where the generalization capability is connected to causality to achieve more robust predictions against distribution shifts. Unlike most prior work that focuses on model architecture design, we extend our attention to auxiliary label generation, leading to more consistent and superior learning results. In the second part, we move from single to multiple modalities (e.g., image, text, audio). Here, identifying shared information across modalities (i.e., the generalized knowledge) is the key to successful model training. We argue that modalities are inherently imbalanced due to their distinct complexities. To this end, we propose a deep Bayesian approach to infer the weights that balance the modalities. I will present intriguing results regarding the insights gained from the NNs trained with our approach. In the final part, I will discuss our recent studies on Large Language Models (LLMs), specifically exploring how and when an LLM demonstrates its generalization capabilities to answer a question. I will share our findings regarding a small set of neurons that drive an LLM to utilize either memorization or reasoning mechanisms. The existence of these neurons not only helps us understand LLM behavior but also points to a potential new metric for monitoring LLM’s trustworthiness.
線上視訊請點選連結
