Discussion about this post

User's avatar
Max Räuker's avatar

Great work, thanks for sharing!

I was recently thinking about capability restrictions, similar to the Unlearning paradigm, but going beyond specific dangerous domains. I thought it might become desireable at some future point to restrict the reasoning capabilities of AI systems more generally in order to make them less able to behave in unintended and catastrophic ways. This plausibly falls more into the scope of technical AI governance topics, but I suppose this would also require some further technical safety research to be implementable.

Expand full comment
Chris L's avatar

Are safety evaluations neglected by frontier labs or is it just that it tends to be released as a model card rather than as a paper?

Expand full comment
2 more comments...

No posts