Institute of Statistical Science Academia Sinica

Seminars

Seminars Seminars

Probing and Enhancing Generalization in Deep Neural Networks

2026-03-30 (Mon.), 10:30 AM
Auditorium, B1F, Institute of Statistical Science；The tea reception will be held at 10:10.
Online live streaming through Microsoft Teams will be available.
Prof. Keng-Te Liao
Institute of Statistics and Data Science, National Tsing Hua University

Abstract

In this talk, I will share our recent progress in deep learning with a focus on the generalization capabilities of neural networks (NNs). This talk is organized into three parts. First, I will discuss invariant learning, where the generalization capability is connected to causality to achieve more robust predictions against distribution shifts. Unlike most prior work that focuses on model architecture design, we extend our attention to auxiliary label generation, leading to more consistent and superior learning results. In the second part, we move from single to multiple modalities (e.g., image, text, audio). Here, identifying shared information across modalities (i.e., the generalized knowledge) is the key to successful model training. We argue that modalities are inherently imbalanced due to their distinct complexities. To this end, we propose a deep Bayesian approach to infer the weights that balance the modalities. I will present intriguing results regarding the insights gained from the NNs trained with our approach. In the final part, I will discuss our recent studies on Large Language Models (LLMs), specifically exploring how and when an LLM demonstrates its generalization capabilities to answer a question. I will share our findings regarding a small set of neurons that drive an LLM to utilize either memorization or reasoning mechanisms. The existence of these neurons not only helps us understand LLM behavior but also points to a potential new metric for monitoring LLM’s trustworthiness.

Please click here for participating the talk online.

Update：2026-02-24 14:39

Back