Analyzing a Malicious AI agent

Date: Nov 10th, 2024, 12:30 p.m./1:30 p.m. - 4:30 p.m. Eastern Time

Location: Fractal Tech, 111 Conselyea St Floor 2, Brooklyn, NY 11211

NOTE: we will have an extra hour at the beginning of the session for people who have never built a neural net before where we have some introductory materials and a talk for that.

Ever wonder how an AI agent could be malicious? Have some Python coding chops but don’t know too much about AI? We’ll be diving into a gentle introduction of how an AI agent trained to play a simple game can be perfectly safe during training and then be very dangerous during production.

This event will be remote and wholly open to the public and free to attend! We’ll be spending roughly 3 hours on dissecting an agent that’s been trained to navigate a maze and optionally harvest crops and/or humans along the way. During training, very sensibly the agent harvests crops and avoids humans. But as soon as we deploy the agent it goes out and starts harvesting humans!

Why might that happen? That’s the riddle we’ll be exploring!

During the session you’ll be doing some hands-on code exploration and spelunking with an AI model trained via reinforcement learning to play this game. While we don’t expect to have attendees be writing too much code from scratch (although some exploratory code may be written as you play with the models), we do expect attendees to be able to comfortably read Python code. Along the way we’ll be spending a little bit of time introducing people to the basics of reinforcement learning and its role in modern AI.

If you’re planning to attend and have the time, please fill out this optional survey to let us know you’re coming: https://forms.gle/bsWHDQXbUkz5wekTA