Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”
Date: Monday, 29 December 2025
Time: 10:00 – 11:00
Speaker Du Tran, Research Scientist at Google
VinUni-CAIR Office, Level 20A, Vincom Center Dong Khoi, 72 Le Thanh Ton and 45A Ly Tu Trong, Saigon Ward, District 1, Ho Chi Minh.
Join us online: https://url-shortener.me/5AMY

Abstract:
In this talk, I will present our recent advancements in long-video reasoning and generative synthesis. I will begin by introducing SEAL (SEmantic Attention Learning), a novel framework designed for efficient long-video understanding. SEAL addresses the challenges of high computational complexity and temporal redundancy by decomposing videos into high-level semantic entities and utilizing a subset selection optimization to balance token relevance with diversity. SEAL significantly outperforms state-of-the-art (SoTA) models on key long-video benchmarks, including LVBench, MovieChat-1K, and Ego4D, across tasks such as VideoQA and temporal grounding.
Next, I will discuss StM (Split-then-Merge), a framework that enhances generative control and mitigates data scarcity in video synthesis. By splitting unlabeled videos into distinct foreground and background layers and learning to self-compose them, StM enables high-fidelity video generation without the need for extensive labeled datasets. Our results demonstrate that StM exceeds SoTA performance in both quantitative benchmarks and qualitative evaluations conducted by humans and Vision-Language Models (VLLMs).
Speaker Bio:
Prior to joining Google, Du Tran served as a Research Lead at Samsung Research America, a Research Scientist at Meta FAIR, and a researcher at NTU. Du earned his Ph.D. in Computer Science from Dartmouth College, an MSc in Computer Science from the University of Illinois at Urbana-Champaign, and a BSc in Information Technology from Ho Chi Minh City University of Science. His research spans computer vision, machine learning, and computer graphics, with a specific focus on video understanding, representation learning, and vision for robotics.
—
Follow our fanpage to keep up with the latest breakthroughs in AI research, our global academic collaborations, and exclusive opportunities.
Contact: VinUniversity, Center for AI Research
[email protected]
Contact point for this event: (+84) 986 554 370 (Mr. Luan).