Upcoming seminar: "Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition."

Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”

Date: Monday, 29 December 2025
Time: 10:00 – 11:00
Speaker Du Tran, Research Scientist at Google
VinUni-CAIR Office, Level 20A, Vincom Center Dong Khoi, 72 Le Thanh Ton and 45A Ly Tu Trong, Saigon Ward, District 1, Ho Chi Minh.

Join us online: https://url-shortener.me/5AMY

Abstract:

In this talk, I will present our recent advancements in long-video reasoning and generative synthesis. I will begin by introducing SEAL (SEmantic Attention Learning), a novel framework designed for efficient long-video understanding. SEAL addresses the challenges of high computational complexity and temporal redundancy by decomposing videos into high-level semantic entities and utilizing a subset selection optimization to balance token relevance with diversity. SEAL significantly outperforms state-of-the-art (SoTA) models on key long-video benchmarks, including LVBench, MovieChat-1K, and Ego4D, across tasks such as VideoQA and temporal grounding.

Next, I will discuss StM (Split-then-Merge), a framework that enhances generative control and mitigates data scarcity in video synthesis. By splitting unlabeled videos into distinct foreground and background layers and learning to self-compose them, StM enables high-fidelity video generation without the need for extensive labeled datasets. Our results demonstrate that StM exceeds SoTA performance in both quantitative benchmarks and qualitative evaluations conducted by humans and Vision-Language Models (VLLMs).

Speaker Bio:

Prior to joining Google, Du Tran served as a Research Lead at Samsung Research America, a Research Scientist at Meta FAIR, and a researcher at NTU. Du earned his Ph.D. in Computer Science from Dartmouth College, an MSc in Computer Science from the University of Illinois at Urbana-Champaign, and a BSc in Information Technology from Ho Chi Minh City University of Science. His research spans computer vision, machine learning, and computer graphics, with a specific focus on video understanding, representation learning, and vision for robotics.
—

Follow our fanpage to keep up with the latest breakthroughs in AI research, our global academic collaborations, and exclusive opportunities.

Contact: VinUniversity, Center for AI Research
[email protected]
Contact point for this event: (+84) 986 554 370 (Mr. Luan).

Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”

Discover how modern medical AI and large-scale models are transforming diagnosis and treatment through the seminar “Modern Medical AI with Large-Scale Models”

#InsideCAIR | Our New Publications from CAIR

#CAIRacters: AI Researcher Bùi Khánh Vĩnh’s Innovative Research on Neural Network Geometry Recognized at ICML 2026

Upcoming seminar: “Exploiting Semantic Structures in Video: From Representation Learning to Generative Composition.”

Related News

Discover how modern medical AI and large-scale models are transforming diagnosis and treatment through the seminar “Modern Medical AI with Large-Scale Models”

#InsideCAIR | Our New Publications from CAIR

#CAIRacters: AI Researcher Bùi Khánh Vĩnh’s Innovative Research on Neural Network Geometry Recognized at ICML 2026