@geoffreyirving Profile picture

Geoffrey Irving

@geoffreyirving

Chief Scientist at the UK AI Safety Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc. @[email protected]

Similar User
Jan Leike photo

@janleike

Chris Olah photo

@ch402

Rohin Shah photo

@rohinmshah

Ali Eslami photo

@arkitus

Victoria Krakovna photo

@vkrakovna

Jane Wang photo

@janexwang

jasmine collins photo

@jazco

Shakir Mohamed photo

@shakir_za

Been Kim photo

@_beenkim

Ethan Perez photo

@EthanJPerez

Evan Hubinger photo

@EvanHub

Shane Legg photo

@ShaneLegg

Nal photo

@nalkalc

Marc G. Bellemare photo

@marcgbellemare

Eric Jang photo

@ericjang11

Pinned

New post about safety cases at AISI! To complement to our empirical evaluations of frontier AI models, AISI is planning collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy.


I don’t expect to actually work fully remotely in the foreseeable future, but I’m curious to have a more detailed sense how well-structured remote work operates (not least as it has lessons for managing hybrid work). Any good books on the topic?


Geoffrey Irving Reposted

Our new paper on safety cases, in collaboration with @GovAI_ shows how it’s possible to write safety cases for current systems, using existing techniques. We hope to see organisations using templates like this for their models. arxiv.org/abs/2411.08088

Tweet Image 1

Geoffrey Irving Reposted

Safety cases are a promising risk management tool for frontier AI. But what does an AI safety case look like in practice? We have an idea! In our new paper, we present a safety case template for a cyber inability argument: arxiv.org/abs/2411.08088 Summary in the thread (1/8)

Tweet Image 1

In San Francisco next week! Looking forward to the trip.


Geoffrey Irving Reposted

We're adopting Inspect as our evals framework. Over the last months, we tried to assess whether we expect Inspect to be a well-run OS package like PyTorch or one of the many ambitious failed OS libraries out there. I'm fairly confident now that it will be more like PyTorch.…

Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to working with the Inspect community. More details in our blog: apolloresearch.ai/blog/apollo-is…



Geoffrey Irving Reposted

Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for safety research. aisi.gov.uk/work/inspect-e… 1/2

Tweet Image 1

Geoffrey Irving Reposted

Scaling human oversight of AI systems is key to providing high-quality training signal as systems tackle increasingly complex tasks.

"We want to create a situation where we're empowering … human raters [of AI fact checkers] to be making better decisions than they would on their own." – Sophie Bridgers discussing scalable oversight and improving human-AI collaboration at the Vienna Alignment Workshop.



Geoffrey Irving Reposted

Quanta has an excellent article on how staging a debate between two AI models can help a human understand better. tinyurl.com/3m78k7xj This term @danqi_chen and I used debates as an educational tool in our grad seminar class on LLMs. princeton-cos597r.github.io 25 min in each…


Frivolous question for complexity theory folk: in practice, is compiling latex in NC (polynomial work, polylogarithmic depth computations)? Obviously it isn't in general, but I'm curious if it is for typical papers.


Loading...

Something went wrong.


Something went wrong.