Geoffrey Irving
@geoffreyirvingChief Scientist at the UK AI Safety Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc. @[email protected]
Similar User
@janleike
@ch402
@rohinmshah
@arkitus
@vkrakovna
@janexwang
@jazco
@shakir_za
@_beenkim
@EthanJPerez
@EvanHub
@ShaneLegg
@nalkalc
@marcgbellemare
@ericjang11
New post about safety cases at AISI! To complement to our empirical evaluations of frontier AI models, AISI is planning collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy.
I don’t expect to actually work fully remotely in the foreseeable future, but I’m curious to have a more detailed sense how well-structured remote work operates (not least as it has lessons for managing hybrid work). Any good books on the topic?
Our new paper on safety cases, in collaboration with @GovAI_ shows how it’s possible to write safety cases for current systems, using existing techniques. We hope to see organisations using templates like this for their models. arxiv.org/abs/2411.08088
Safety cases are a promising risk management tool for frontier AI. But what does an AI safety case look like in practice? We have an idea! In our new paper, we present a safety case template for a cyber inability argument: arxiv.org/abs/2411.08088 Summary in the thread (1/8)
In San Francisco next week! Looking forward to the trip.
We're adopting Inspect as our evals framework. Over the last months, we tried to assess whether we expect Inspect to be a well-run OS package like PyTorch or one of the many ambitious failed OS libraries out there. I'm fairly confident now that it will be more like PyTorch.…
Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to working with the Inspect community. More details in our blog: apolloresearch.ai/blog/apollo-is…
Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for safety research. aisi.gov.uk/work/inspect-e… 1/2
Scaling human oversight of AI systems is key to providing high-quality training signal as systems tackle increasingly complex tasks.
"We want to create a situation where we're empowering … human raters [of AI fact checkers] to be making better decisions than they would on their own." – Sophie Bridgers discussing scalable oversight and improving human-AI collaboration at the Vienna Alignment Workshop.
Quanta has an excellent article on how staging a debate between two AI models can help a human understand better. tinyurl.com/3m78k7xj This term @danqi_chen and I used debates as an educational tool in our grad seminar class on LLMs. princeton-cos597r.github.io 25 min in each…
Frivolous question for complexity theory folk: in practice, is compiling latex in NC (polynomial work, polylogarithmic depth computations)? Obviously it isn't in general, but I'm curious if it is for typical papers.
United States Trends
- 1. #UFC309 271 B posts
- 2. Jon Jones 134 B posts
- 3. Jon Jones 134 B posts
- 4. Jon Jones 134 B posts
- 5. Chandler 84 B posts
- 6. Oliveira 69,6 B posts
- 7. Kansas 21,1 B posts
- 8. #discorddown 6.499 posts
- 9. Bo Nickal 8.538 posts
- 10. Do Bronx 10,5 B posts
- 11. #MissUniverse 423 B posts
- 12. Tennessee 54,8 B posts
- 13. Rock Chalk 1.212 posts
- 14. Tatum 27,4 B posts
- 15. #BYUFootball 1.289 posts
- 16. Keith Peterson 1.288 posts
- 17. #kufball 1.019 posts
- 18. Oregon 34,1 B posts
- 19. Paul Craig 4.402 posts
- 20. Beck 21,6 B posts
Who to follow
-
Jan Leike
@janleike -
Chris Olah
@ch402 -
Rohin Shah
@rohinmshah -
Ali Eslami
@arkitus -
Victoria Krakovna
@vkrakovna -
Jane Wang
@janexwang -
jasmine collins
@jazco -
Shakir Mohamed
@shakir_za -
Been Kim
@_beenkim -
Ethan Perez
@EthanJPerez -
Evan Hubinger
@EvanHub -
Shane Legg
@ShaneLegg -
Nal
@nalkalc -
Marc G. Bellemare
@marcgbellemare -
Eric Jang
@ericjang11
Something went wrong.
Something went wrong.