So AIs are lying to us now, huh?

Talk by Matthew Wearden

As AI systems are getting smarter, safety researchers are increasingly concerned by a new capability of these models - they are getting very good at lying. This behaviour is seen both in and out of evaluation settings, and has a fascinating range of root causes. In this talk, we'll investigate the phenomena of AI deception across a broad range of scenarios. Discover how chatbots lie for your own good, when models strategically hide their capabilities, and why ChatGPT is better than you at Avalon. We'll cover how researchers are training model organisms designed to be good at deception, and how this helps us to detect scheming in the wild. Most importantly, we'll attempt to answer the question: how worried should we be about all this? (Disclaimer: I run the UK branch of a non-profit AI safety research program, that specifically conducts a lot of research on this topic. Also, while this is a somewhat serious topic, I hope to present it in a lighthearted and entertaining way!)

If you would like to mark this as a favourite please log in.

 

Return to: