Anthropic Automates AI Safety Research with Claude

Anthropic's automated AI agents, using Claude, achieved a 0.97 Performance Gap Recovered (PGR) score on a weak-to-strong supervision task, crushing the 0.23 score achieved by human researchers. This is one of the first concrete examples of automating open-ended AI research, where agents autonomously proposed, tested, and iterated on ideas. Engineers should anticipate R&D cycles accelerating as AI agents begin to tackle complex research problems.
Anthropic's automated AI research (AAR) agents achieved a 0.97 Performance Gap Recovered (PGR) score, nearly closing the entire performance gap in a weak-to-strong supervision task. This result, using Claude, massively outperformed two human researchers who spent seven days to achieve a score of just 0.23. The experiment shows that automating AI research is no longer theoretical; the agents autonomously proposed ideas and ran experiments to train a strong model (Qwen 3-4B) using only a weaker one (Qwen 1.5-0.5B). The entire process cost ~$18,000. This is a major milestone, signaling a future where R&D workflows are augmented or led by AI agents, compressing discovery timelines.
Read the original → Import AI (Jack Clark)
- #ai
- #research
- #anthropic
- #llm
- #automation
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.