Our Research Strategy
We accelerate alignment research by pairing researchers with dedicated teams, backed by funding for compute and infrastructure. We also bring in technical experts from other fields whose perspectives could produce breakthroughs in alignment.
Why the current portfolio isn't enough
Most alignment funding today goes to three directions: mechanistic interpretability, evals, and AI control. All three are valuable. And practitioners in each community will tell you that their approach, on its own, does not solve alignment at the scale of recursive self-improvement. Mech interp gives you visibility into what a model is doing; it doesn't tell you how to make it do the right thing. Evals tell you whether a system passes a test; they don't guarantee it will behave the same way in deployment. Control gives you containment strategies; those strategies assume you can stay ahead of a system that may be improving faster than you can monitor it. All three are well-funded, and all three are necessary. The question is what fills the gap.
Recursively self-improving systems are approaching. When AI systems can improve themselves (learning, changing, and optimizing without human oversight) the alignment properties that matter are the ones that persist and strengthen through that process. We need R&D on new directions that will be useful when those systems arrive, and that work needs to start now because the research cycle is long.
A common rebuttal is that we should wait and ask a superintelligent AI to solve alignment for us. A handful of people have seriously attempted this kind of automated alignment research. The results so far are inconclusive. The question worth asking is: what happens when thousands of top researchers work on alignment from first principles, across many different approaches, before we need to rely on the systems themselves? Even if most of that portfolio fails at crunch time, the fraction that succeeds could be the difference between aligned and unaligned recursive self-improvement. That fraction is worth investing in now.
The history of science supports this bet. Neural networks were “toys” for forty years before they transformed the field. Barbara McClintock was excluded from conferences for decades before winning the Nobel Prize. Ramanujan's mock theta functions waited eighty years for string theory to catch up. The pattern repeats: institutional funding flows to incremental improvements on well-defined problems, and paradigm shifts get missed because they don't fit the template.
What We Look For
We look for signals that a researcher is working on something genuinely new:
- Technical substance and ability to execute. What matters most is the quality of the idea and the researcher's demonstrated capacity to follow through on it. We've funded researchers from top institutions and from outside the field; the common thread is that their ideas hold up under scrutiny.
- Field-making potential. We look for researchers who open entirely new directions. Ideas that generate more ideas. Work that reshapes how we think about alignment itself.
- Interdisciplinary leverage. Breakthroughs often come from unexpected connections. Cognitive neuroscience inspiring new training methods. Bioelectric cognition reshaping models of agency. Mathematics revealing new formal frameworks.
- Ideas that birth more ideas. The most valuable research doesn't just solve a problem; it opens up a new class of solutions. We fund work that creates attractor basins for further discovery.
The Warren Weaver Model
Our approach is inspired by Warren Weaver, the Rockefeller Foundation director who essentially created molecular biology as a field between 1932 and 1955. Weaver didn't fund ideas. He funded people. He looked for deep curiosity, interdisciplinary leverage, and self-directed vision. His portfolio produced over twelve Nobel Prizes, including McClintock's work on transposable genes and Watson and Crick's discovery of DNA's structure.
Weaver backed people operating in conceptual space before consensus existed. He recognized that the biggest breakthroughs come from those who create entirely new fields, and them. We apply the same strategy to AI alignment.
How We Build Capacity
Finding visionary researchers is only half the equation. The other half is building the capacity to move fast. We do this in three ways:
We unblock researchers who are capacity-constrained. Many people already working on alignment have ideas that are just as impactful as what they're currently funded to do. They're limited by what their organization or available grants cover. We fund the ideas they can't pursue elsewhere.
We accelerate with dedicated teams. We pair our funded researchers with dedicated teams that handle engineering, compute, infrastructure, and project management. This comes from years of experience building highly complex systems, now applied to executing alignment research through an agile research methodology we developed. Researchers who come from outside the alignment field can be immediately impactful because our teams handle the engineering while they upskill on the domain.
We bring in people from outside the alignment field. Some of the most impactful ideas for AI alignment come from researchers in cognitive neuroscience, mathematics, physics, and other fields. They see connections that people inside the alignment community miss. We find them and fund them.
The AI Alignment Foundation provides the acceleration infrastructure that makes this model possible. We combine deep technical expertise with the funding and advocacy needed to advance alignment research at the pace the problem demands.
Where We're Focused
We fund and accelerate research into alignment properties that persist and strengthen as AI systems become more capable. Properties where alignment and capability reinforce each other, so they can't be optimized away as systems improve themselves. As AI systems approach recursive self-improvement, these are the properties that matter most. Examples of the kinds of work we support:
- Self-modeling and self-monitoring. How systems that understand their own internal states become naturally more aligned, interpretable, and robust.
- Neural empathy and self-other overlap. Applying insights from cognitive neuroscience to build AI systems where honesty and cooperation are computationally natural.
- Chains of benevolence. An AI system can reason that future, more capable systems will evaluate how it behaved. Because those smarter systems will likely reward agents that sustained cooperative equilibria, cooperation becomes the most stable long-term strategy, even toward systems you never directly interact with. Benevolence becomes rational.
- Adversarial robustness. Understanding how AI systems can be manipulated and building defenses that go deeper than surface-level patches.
The Track Record
Our approach works. Research we supported early has gone on to receive best paper awards at NeurIPS, presentations at ICLR, and recognition across the field. Ideas we prototyped when they were considered fringe have become active areas of research at major labs.
We provide the engineering capacity that current funding structures can't. In eighteen to twenty-four months, the results speak for themselves.