Speaker Details

Bo Li
University of Illinois at Urbana–Champaign
Dr. Bo Li is an Associate Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. She is the recipient of the IJCAI Computers and Thought Award, Alfred P. Sloan Research Fellowship, IEEE AI’s 10 to Watch, NSF CAREER Award, MIT Technology Review TR-35 Award, Dean's Award for Excellence in Research, C.W. Gear Outstanding Faculty Award, Intel Rising Star Award, Symantec Research Labs Fellowship, Rising Star Award, Research Awards from Tech companies such as Amazon, Meta, Google, Intel, IBM, and eBay, JPMC, Oracle, and best paper awards at several top machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, which is at the intersection of machine learning, security, privacy, and game theory. Her work has been featured by several major publications and media outlets, including Nature, Wired, Fortune, and New York Times.
Talk
Title: Guarding the Future: Advancing Risk Assessment, Safety Alignment, and Guardrail Systems for AI Agents
Abstract: Autonomous agents built on foundation models are increasingly being deployed in dynamic, high-stakes real-world environments—from web automation to AI operating systems. Despite their promise, these agents remain highly susceptible to adversarial instructions and manipulation, posing serious risks including policy violations, data leakage, and financial harm.
In this talk, I will present a comprehensive framework for assessing and strengthening the safety of AI agents. We begin by examining principles and methodologies for robust agent evaluation, with a focus on red teaming-based stress testing across diverse adversarial scenarios, ranging from agent poisoning to WebAgent manipulation to general blackbox agent attacks. Building on these foundations, I will introduce Shield Agent, the first guardrail agent explicitly designed to enforce policy-aligned behavior in autonomous agents through structured reasoning and verification. Shield agent constructs a verifiable safety model by extracting and encoding actionable safety rules from formal AI security policy documents into probabilistic graphical rule circuits. Given a target agent’s action trajectory, Shield agent dynamically retrieves relevant safety rules and synthesizes shielding strategies by building a rich tool library and formally executable code. To support comprehensive AI agent evaluations, I will also introduce a novel benchmark comprising 3,000 safety-critical instruction-action pairs derived from state-of-the-art attack scenarios across six web environments and seven distinct risk categories.
This talk aims to highlight both the urgent challenges and emerging solutions in building trustworthy, resilient AI agents—hoping to lay the groundwork for a new generation of safety-aligned autonomous systems.