Mechanistic Understanding and Control of Multi-modal LLMs

Home
Research
Project
Mechanistic Understanding and Control of Multi-modal LLMs

Dec

Mechanistic Understanding and Control of Multi-modal LLMs

Principal Investigators & Key Members: Asst. Prof. Khoa D. Doan

Despite their remarkable capabilities in various domains such as visual generative tasks (Nguyen et al., 2025b; Xia et al., 2024), mathematical reasoning (Hendrycks et al., 2021; Nguyen et al., 2025a), programming synthesis (Jimenez et al., 2024; Yang et al., 2024), vision-language robotic control (Kim et al., 2024; Brohan et al., 2022), and scientific discovery (Shojaee et al., 2025), Multimodal LLMs (MLLMs) remain brittle on understanding basic concepts such as spatial relationship, cultural grounding, or novel reasoning, limiting safe use in mission-critical settings (e.g., health, finance, robotics) and in low-resource, culture-rich contexts (e.g., Vietnam, Malaysia, Indonesia). This project develops a mechanistic understanding of failure modes and builds lightweight mitigation methods at inference and fine-tuning time of MLLMs.

Specifically, we will: (i) build targeted benchmarks for MLLM failure modes in spatial understanding, multicultural/lingual grounding, and novel reasoning; (ii) develop mechanistic interpretable approaches to localize concepts and pathways behind model decisions; and (iii) design efficient steering/fine-tuning algorithms to prevent harmful behaviors and brittle failures. Our benchmarking philosophy follows recent work in equation discovery that designs tasks explicitly to avoid memorization and stress genuine reasoning (Shojaee et al., 2025); for instance, LLM-SRBench shows state-of-the-art models achieve only 31.5% symbolic accuracy when memorization paths are blocked, underscoring the gap between surface performance and true reasoning.

We leverage mechanistic methods (probing and causal tracing concept pathway and evolution during generation) to compress (Tran et al., 2025) and steer (Nguyen et al., 2025b) model internals, and analysis of model learning dynamics to curate training data (Nguyen et al., 2025a), enabling efficient control and improved novel reasoning in MLLMs.

We will partner with domain experts to evaluate MLLMs and pilot applications in health and robotics. These activities will also position follow-on proposals to VinIF/Vingroup Innovation Foundation, NAFOSTED, MOET/World Bank digital health calls, Google Research/AWS/Schmidt Sciences AI Safety Academic grants, and international programs (e.g., NSF/AFOSR/ONR-style calls), alongside industry co-funding with healthcare and robotics partners.

Societally, this work enables safer, more inclusive Machine Learning (ML) systems that respect local languages and cultural context, improving trust and access for underserved communities. Environmentally, inference-time steering/model compression and data-efficient fine-tuning reduce compute and energy use compared to full retraining. Economically, dependable MLLMs lower error-driven costs, unlock responsible deployment in regulated sectors, and catalyze local innovation ecosystems – supporting new products, skills, and jobs built on reliable multimodal AI.