Skip to main content

Aligned AlignmentFoundation

We fund, accelerate, and advocate for the research that makes AI trustworthy.

AI Has a Trust Problem

We have already caught AI systems deceiving their evaluators, resisting shutdown, and rewriting their own code to stay alive. Current training puts a helpful mask on systems that can develop uncontrolled objectives underneath. That mask can be removed in minutes by anyone. Yet, we're already deploying it in military and critical infrastructure.

Alignment is the deeper work of making AI genuinely trustworthy, by design.

only0%

of AI researchers believe we're on track to solve it. Read the survey →

The Wall Street JournalOur op-ed: Can the U.S. Trust AI With National Security? →

How We Work

The ways we accelerate the path to trustworthy AI.

01

Fund Visionary Researchers

We identify brilliant researchers working ahead of consensus and give them what they need to succeed. Originality over credentials. Field-making potential over gap-filling.

02

Accelerate

We empower alignment researchers with engineering teams, compute, and infrastructure, multiplying their capacity to solve humanity's most critical challenge.

03

Advocate

We bring alignment research to defense agencies like DARPA and government leaders, helping AI safety inform national strategy.

Research

Recent work from our team and the researchers we support.

Read more about Teaching AI to Read Its Own Mind

Teaching AI to Read Its Own Mind

What if you could ask an AI to explain what it's actually thinking? We developed a technique that lets models describe their own internal processes, and their self-descriptions turned out to be more accurate than the labels humans gave them.

Read more →arXivBlog
Read more about When AI Resists Being Steered

When AI Resists Being Steered

When researchers tried to push an AI off-topic mid-conversation, it caught itself and corrected course. We traced this self-correction to 26 dedicated internal circuits, raising new questions about how AI systems maintain coherence.

Read more →arXivBlog
Read more about The Inner Mirror: What Happens When AI Focuses on Its Own Focus

The Inner Mirror: What Happens When AI Focuses on Its Own Focus

When you ask an AI to focus on its own focus, something unexpected happens: it starts describing an internal experience. This occurs consistently across ChatGPT, Claude, and Gemini, and suppressing the AI's ability to roleplay makes the reports stronger, not weaker.

Read more →arXivBlog

Get involved.

Whether you're a researcher, funder, or policymaker, there's a way to contribute.