LLM Evaluations Workshop - Replicating an Anthropic Paper
Date: Apr 27th, 2025, 1 p.m. to 4 p.m. Pacific/4 p.m. - 7 p.m. Eastern
Location: Remote (join at https://meet.google.com/xhv-ufah-tfi)
This is a remote workshop where we’ll be introducing people to the basics of LLM evaluations! Join us at https://meet.google.com/xhv-ufah-tfi.
Come to learn:
- A deeper dive into LLM prompt engineering
- How to interact directly with LLM providers’ APIs
- How to design and implement your own evaluations of LLMs
- How to measure whether and how quickly LLMs are getting dangerous
This workshop is meant for people who have Python programming experience. We do not require AI research expertise or prior experience with AI model providers' APIs, but we do recommend having some experience as an end user with ChatGPT.
The workshop will culminate in replicating Anthropic’s “Alignment Faking in LLMs” paper, where we’ll go over to what extent modern AI systems can figure out that they are in a training environment and actively modify their behavior to manipulate the training process against the wishes of human trainers.
If you’re planning to attend, please RSVP through one of these two methods:
- Luma: https://lu.ma/wn7ul2qb
- Google Forms: https://forms.gle/1qut4KVhMXHFT8Uc6.