Large Model Safety Workshop 2025

Speaker Details

Mohit Bansal

University of North Carolina Chapel Hill

Dr. Mohit Bansal is the John R. & Louise S. Parker Distinguished Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at UNC Chapel Hill. He received his PhD from UC Berkeley in 2013 and his BTech from IIT Kanpur in 2008. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on multimodal generative models, grounded and embodied semantics, reasoning and planning agents, faithful language generation, and interpretable, efficient, and generalizable deep learning. He is a AAAI Fellow and recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE), IIT Kanpur Young Alumnus Award, DARPA Director's Fellowship, NSF CAREER Award, Google Focused Research Award, Microsoft Investigator Fellowship, Army Young Investigator Award (YIP), DARPA Young Faculty Award (YFA), and outstanding paper awards at ACL, CVPR, EACL, COLING, CoNLL, and TMLR. He has been a keynote speaker for the AACL 2023, CoNLL 2023, and INLG 2022 conferences. His service includes EMNLP and CoNLL Program Co-Chair, and ACL Executive Committee, ACM Doctoral Dissertation Award Committee, ACL Americas Sponsorship Co-Chair, and Associate/Action Editor for TACL, CL, IEEE/ACM TASLP, and CSL journals. Webpage: https://www.cs.unc.edu/~mbansal/

Talk

Title: Trustworthy Planning Agents for Collaborative Reasoning and Multimodal Generation

Abstract: In this talk, I will present our journey of developing trustworthy and adaptive AI planning agents that can reliably communicate and collaborate for uncertainty-calibrated reasoning (on math, commonsense, coding, tool use, etc.) as well as for interpretable, controllable multimodal generation (across text, images, videos, audio, layouts, etc.). In the first part, we will discuss how to teach agents to be trustworthy and reliable collaborators via social/pragmatic multi-agent interactions (e.g., confidence calibration via speaker-listener reasoning and learning to balance positive and negative persuasion), as well as how to acquire and improve agent skills needed for efficient and robust perception and action (e.g., learning reusable, verified abstractions over actions & code, and adaptive data generation based on discovered weak skills). In the second part, we will discuss interpretable and controllable multimodal generation via LLM-agents based planning and programming, such as layout-controllable image generation and evaluation via visual programming (VPGen, VPEval, DSG), consistent video generation via LLM-guided multi-scene planning, targeted corrections, and retrieval-augmented motion adaptation (VideoDirectorGPT, VideoRepair, DreamRunner), and interactive and composable any-to-any multimodal generation (CoDi, CoDi-2).