Geoffrey Irving @geoffreyirving Twitter Profile

Geoffrey Irving

@geoffreyirving

Chief Scientist at the UK AI Safety Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc. @[email protected]

4KPosts 9KFollowers 309Following

Similar User

@janleike

@ch402

@rohinmshah

@arkitus

@vkrakovna

@janexwang

@jazco

@shakir_za

@_beenkim

@EthanJPerez

@EvanHub

@ShaneLegg

@nalkalc

@marcgbellemare

@ericjang11

Pinned

Geoffrey Irving

@geoffreyirving

23 Aug

New post about safety cases at AISI! To complement to our empirical evaluations of frontier AI models, AISI is planning collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy.

Geoffrey Irving

@geoffreyirving

19 h

I don’t expect to actually work fully remotely in the foreseeable future, but I’m curious to have a more detailed sense how well-structured remote work operates (not least as it has lessons for managing hybrid work). Any good books on the topic?

Geoffrey Irving Reposted

AI Safety Institute

@AISafetyInst

15 Nov

Our new paper on safety cases, in collaboration with @GovAI_ shows how it’s possible to write safety cases for current systems, using existing techniques. We hope to see organisations using templates like this for their models. arxiv.org/abs/2411.08088

Geoffrey Irving Reposted

Arthur Goemans

@arthur_goemans_

14 Nov

Safety cases are a promising risk management tool for frontier AI. But what does an AI safety case look like in practice? We have an idea! In our new paper, we present a safety case template for a cyber inability argument: arxiv.org/abs/2411.08088 Summary in the thread (1/8)

Geoffrey Irving

@geoffreyirving

13 Nov

In San Francisco next week! Looking forward to the trip.

Geoffrey Irving Reposted

Marius Hobbhahn

@MariusHobbhahn

13 Nov

We're adopting Inspect as our evals framework. Over the last months, we tried to assess whether we expect Inspect to be a well-run OS package like PyTorch or one of the many ambitious failed OS libraries out there. I'm fairly confident now that it will be more like PyTorch.…

Apollo Research

@apolloaisafety

13 Nov

Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to working with the Inspect community. More details in our blog: apolloresearch.ai/blog/apollo-is…

Geoffrey Irving Reposted

AI Safety Institute

@AISafetyInst

13 Nov

Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for safety research. aisi.gov.uk/work/inspect-e… 1/2

Geoffrey Irving Reposted

Adam Gleave

@ARGleave

12 Nov

Scaling human oversight of AI systems is key to providing high-quality training signal as systems tackle increasingly complex tasks.

FAR.AI

@farairesearch

12 Nov

"We want to create a situation where we're empowering … human raters [of AI fact checkers] to be making better decisions than they would on their own." – Sophie Bridgers discussing scalable oversight and improving human-AI collaboration at the Vienna Alignment Workshop.

Geoffrey Irving Reposted

Sanjeev Arora

@prfsanjeevarora

8 Nov

Quanta has an excellent article on how staging a debate between two AI models can help a human understand better. tinyurl.com/3m78k7xj This term @danqi_chen and I used debates as an educational tool in our grad seminar class on LLMs. princeton-cos597r.github.io 25 min in each…

Debate May Help AI Models Converge on Truth | Quanta Magazine

Source: https://t.co/JEBUIua3hc

Geoffrey Irving

@geoffreyirving

8 Nov

Frivolous question for complexity theory folk: in practice, is compiling latex in NC (polynomial work, polylogarithmic depth computations)? Obviously it isn't in general, but I'm curious if it is for typical papers.