AI Companies’ Safety Research Leaves…

Feb 17

AI companies’ safety research leaves important gaps. Oscar Delaney and Oliver Guest from IAPS argue that governments and philanthropists should fill them.

Read →

4 Comments

Max Räuker

Feb 19

Great work, thanks for sharing!

I was recently thinking about capability restrictions, similar to the Unlearning paradigm, but going beyond specific dangerous domains. I thought it might become desireable at some future point to restrict the reasoning capabilities of AI systems more generally in order to make them less able to behave in unintended and catastrophic ways. This plausibly falls more into the scope of technical AI governance topics, but I suppose this would also require some further technical safety research to be implementable.

Expand full comment

Reply (1)

Oscar Delaney

Mar 3

Thanks Max, yes restricting certain types of reasoning could be useful, but I wonder how feasible it will be to do so in a very srugical manner without significantly harming the general usefulness of AI models. Given training models on maths and coding seems to make them better reasoners in other domains as well, I am tentatively pessimistic about making models bad at reasoning just in specific ways/domains. But that doesn't mean no-one should try.

Expand full comment

Chris L

Mar 3

Are safety evaluations neglected by frontier labs or is it just that it tends to be released as a model card rather than as a paper?

Expand full comment

Reply (1)

Oscar Delaney

Mar 3

Good point, if safety content is included in a publication that is mainly about capabilities we would not have included it. (Because being 'safety-focused' was part of our inclusion criteria.)

But more importantly, safety evaluations are important to have from independent third-parties, given the COI of companies doing this themselves. Though of course having companies do safety checks is far better than nothing.

Expand full comment

AI Policy Bulletin Newsletter

AI Companies’ Safety Research Leaves…