Analyzing a Malicious AI agent

Date: Oct 26th, 2024, 12:30 p.m. - 3:30 p.m. Pacific Time

Location: Remote (meet.google.com/uui-ioub-qwc)

Ever wonder how an AI agent could be malicious? Have some Python coding chops but don’t know too much about AI? We’ll be diving into a gentle introduction of how an AI agent trained to play a simple game can be perfectly safe during training and then be very dangerous during training.

This event will be remote and wholly open to the public and free to attend! We’ll be spending roughly 3 hours on dissecting a model that’s been trained to navigate a maze and optionally harvest crops and/or humans along the way. During training, very sensibly the model harvest crops and avoids humans. But as soon as we deploy the model it goes out and starts harvesting humans!

Why might that happen? That’s the riddle we’ll be exploring!

During the session you’ll be doing some hands-on code exploration and spelunking with an AI model trained via reinforcement learning to play this game. While we don’t expect to have attendees be writing too much code from scratch (although some exploratory code may be written as you play with the models), we do expect attendees to be able to comfortably read Python code. Along the way we’ll be spending a little bit of time introducing people to the basics of reinforcement learning and its role in modern AI.

If you’re planning to attend please fill out this quick survey to let us know you’re coming: https://forms.gle/BRFPRtqhjFX6MPgT9