Mar
02

Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”

Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”

Date: Monday, 29 December 2025
Time: 10:00 – 11:00
Speaker Du Tran, Research Scientist at Google
VinUni-CAIR Office, Level 20A, Vincom Center Dong Khoi, 72 Le Thanh Ton and 45A Ly Tu Trong, Saigon Ward, District 1, Ho Chi Minh.

Join us online: https://url-shortener.me/5AMY

Abstract:

In this talk, I will present our recent advancements in long-video reasoning and generative synthesis. I will begin by introducing SEAL (SEmantic Attention Learning), a novel framework designed for efficient long-video understanding. SEAL addresses the challenges of high computational complexity and temporal redundancy by decomposing videos into high-level semantic entities and utilizing a subset selection optimization to balance token relevance with diversity. SEAL significantly outperforms state-of-the-art (SoTA) models on key long-video benchmarks, including LVBench, MovieChat-1K, and Ego4D, across tasks such as VideoQA and temporal grounding.

Next, I will discuss StM (Split-then-Merge), a framework that enhances generative control and mitigates data scarcity in video synthesis. By splitting unlabeled videos into distinct foreground and background layers and learning to self-compose them, StM enables high-fidelity video generation without the need for extensive labeled datasets. Our results demonstrate that StM exceeds SoTA performance in both quantitative benchmarks and qualitative evaluations conducted by humans and Vision-Language Models (VLLMs).

Speaker Bio:

Prior to joining Google, Du Tran served as a Research Lead at Samsung Research America, a Research Scientist at Meta FAIR, and a researcher at NTU. Du earned his Ph.D. in Computer Science from Dartmouth College, an MSc in Computer Science from the University of Illinois at Urbana-Champaign, and a BSc in Information Technology from Ho Chi Minh City University of Science. His research spans computer vision, machine learning, and computer graphics, with a specific focus on video understanding, representation learning, and vision for robotics.


Follow our fanpage to keep up with the latest breakthroughs in AI research, our global academic collaborations, and exclusive opportunities.

Contact: VinUniversity, Center for AI Research
[email protected]
Contact point for this event: (+84) 986 554 370 (Mr. Luan).