3 Comments
User's avatar
Katalina Hernández's avatar

Planning to reference this article in one of my AI Safety advocacy projects during my Fellowship with Successif. Thank you for your thoughtful work on this bulletin!

I’ve often observed a major disconnect between AI safety research and the more corporate-facing “Responsible AI” discourse. I primarily work in Spain and the UK, and in both contexts, I’ve noticed that local governments tend to consult nearby big tech companies (such as my own, or firms like Accenture and IBM) when developing AI policies or making decisions around AI deployment.

These companies often serve as informal policy advisors, but the individuals consulted typically come from the Responsible AI space rather than AI safety. While I’m starting to see early signs of convergence (e.g., some of us in industry are bringing safety principles into Responsible AI practices), this remains the exception rather than the norm.

The result is that local governments only receive a partial view of the risks and challenges involved. Many corporate stakeholders in Responsible AI aren’t familiar with foundational safety concepts like alignment (inner/outer), deceptive alignment, or mechanistic interpretability. Their focus tends to be on output-level auditing (fairness, bias, explainability) without engaging with the deeper issues surrounding the behavior of foundation models.

That’s why I think AI safety policy efforts should aim not only to educate policymakers, but also corporate stakeholders. In Europe especially, there’s a critical need for basic AI safety literacy at both levels. And if more people in influential corporate roles were aware of these safety concerns, they might be more willing to advocate for, and fund, relevant research. After all, big tech still holds a lot of the leverage when it comes to shaping what gets prioritised.

Expand full comment
Anton Leicht's avatar

Thank you, that's much appreicated! I think I definitely agree that this sort of literacy would be great & serve the safety policy agenda well. I'm a little unsure how realistic it is to proactively communicate this information at scale, especially if it comes from the civil society angle? My sense is that (a) there are not quite as many well-established pipelines from civil society to 'educating' private stakeholders, esp. where these stakeholders might consider that education as inconsistent with their original business interest; and that (b) usually, safety considerations are best communicated after first motivating a greater awareness of the range of outcomes associated with AI in the first place; which is, again, a tricky message to communicate proactively? That being said, I agree that it would be highly valuable; moreso unsure about the tractability. But it's an important and underrated point either way!

Expand full comment
Katalina Hernández's avatar

Thank you, Anton, really appreciate your thoughtful response! You're right to highlight that advocacy works best when stakeholders already understand the stakes involved, and I agree: just pushing literacy without context won't scale easily or effectively.

My angle here is specifically about stakeholder accountability and downstream responsibility under the EU AI Act and GPAI Codes of Practice. For anyone following: in Europe, public institutions deploying high-risk AI systems (especially those built on GPAI models and then modified or fine-tuned in-house) will face strict legal obligations, to be clarified in the last draft of the CoP. Right now, there’s significant uncertainty among policymakers about where their responsibilities start and end, particularly because they often lack a clear picture of the underlying AI safety risks.

I am worried that, without a baseline understanding of the general state of the arts of alignment and interpretability of common foundation models, relevant stakeholders won’t truly grasp why or how certain outcomes occur. Governmental institutions may deploy high risk systems based on foundation models that, we we know, are not interpretable by design. We should, at least, make sure that relevant policies can justify to what outputs they're responsible for and which not (as downstreamers), and why.

I believe addressing this gap proactively through clear, outcome-focused AI safety literacy can empower decision-makers to confidently manage their compliance responsibilities and ensure more robust oversight.

Would love your thoughts on this! I think this problem needs better formulation, and I need to work on it.

Expand full comment