Anthropic Automates AI Safety Research with Claude

May 7, 2026Source: Import AI (Jack Clark)intermediate

Anthropic's automated AI agents, using Claude, achieved a 0.97 Performance Gap Recovered (PGR) score on a weak-to-strong supervision task, crushing the 0.23 score achieved by human researchers. This is one of the first concrete examples of automating open-ended AI research, where agents autonomously proposed, tested, and iterated on ideas. Engineers should anticipate R&D cycles accelerating as AI agents begin to tackle complex research problems.

Anthropic's automated AI research (AAR) agents achieved a 0.97 Performance Gap Recovered (PGR) score, nearly closing the entire performance gap in a weak-to-strong supervision task. This result, using Claude, massively outperformed two human researchers who spent seven days to achieve a score of just 0.23. The experiment shows that automating AI research is no longer theoretical; the agents autonomously proposed ideas and ran experiments to train a strong model (Qwen 3-4B) using only a weaker one (Qwen 1.5-0.5B). The entire process cost ~$18,000. This is a major milestone, signaling a future where R&D workflows are augmented or led by AI agents, compressing discovery timelines.

Read the original → Import AI (Jack Clark)

#ai
#research
#anthropic
#llm
#automation

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store