LLM Evaluations Workshop - Replicating an Anthropic Paper

Date: Apr 27th, 2025, 1 p.m. to 4 p.m. Pacific/4 p.m. - 7 p.m. Eastern

Location: Remote (join at https://meet.google.com/xhv-ufah-tfi)

This is a remote workshop where we’ll be introducing people to the basics of LLM evaluations! Join us at https://meet.google.com/xhv-ufah-tfi.

Come to learn:

A deeper dive into LLM prompt engineering
How to interact directly with LLM providers’ APIs
How to design and implement your own evaluations of LLMs
How to measure whether and how quickly LLMs are getting dangerous

This workshop is meant for people who have Python programming experience. We do not require AI research expertise or prior experience with AI model providers' APIs, but we do recommend having some experience as an end user with ChatGPT.

The workshop will culminate in replicating Anthropic’s “Alignment Faking in LLMs” paper, where we’ll go over to what extent modern AI systems can figure out that they are in a training environment and actively modify their behavior to manipulate the training process against the wishes of human trainers.

If you’re planning to attend, please RSVP through one of these two methods:

Luma: https://lu.ma/wn7ul2qb
Google Forms: https://forms.gle/1qut4KVhMXHFT8Uc6.