📊 Full opportunity report: Every Benchmark Launched 2023-2024 Has Fallen — The METR / SWE-Bench / CORE-Bench / MLE-Bench / PostTrainBench Sequence on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Six key AI benchmarks launched in 2023-2024 have all saturated or are on track to do so within months. This pattern suggests AI development is advancing faster than previously thought, with implications for AI research, deployment, and policy.
All six major AI research benchmarks introduced in 2023-2024 have now saturated or are on track to do so within months, according to recent analysis by Thorsten Meyer. This pattern indicates that AI capability development is accelerating rapidly, with implications for research, industry, and policy.
Thorsten Meyer reports that every key benchmark designed to measure AI research and engineering capabilities launched during 2023-2024 has either been declared saturated or is approaching saturation on a timeline of months rather than years. These benchmarks include SWE-Bench, METR Time Horizons, CORE-Bench, MLE-Bench, PostTrainBench, and CPU Speedup, each evaluating different facets of AI progress.
For example, SWE-Bench, which measures real-world software engineering tasks, achieved 93.9% accuracy from an initial 2% in late 2023, reaching saturation within 30 months. Similarly, METR Time Horizons, which assesses the duration of tasks AI can reliably complete, expanded from 30 seconds to 12 hours over four years, with a growth rate suggesting near-complete saturation by 2026. The CORE-Bench, used for research reproduction, was declared solved by its authors after reaching 95.5% in December 2025, just 15 months after starting from 21.5%. These patterns are consistent across all six benchmarks, indicating a rapid convergence towards AI system capabilities that match or exceed human performance in specific tasks.
Implications of Rapid Benchmark Saturation for AI Development
The saturation of all major benchmarks within a short timeframe suggests that AI systems are reaching a level of capability that may significantly impact research, industry applications, and policy regulation. This rapid progress challenges previous assumptions about the timeline of AI development and raises questions about the pace of deployment, safety, and governance. Stakeholders should reconsider current strategies and prepare for a landscape where AI systems are highly capable across multiple dimensions in a matter of months.
AI benchmark testing hardware
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background on AI Benchmark Progress and Recent Developments
Throughout 2023 and 2024, AI research organizations launched several benchmarks aimed at measuring progress in AI engineering and research capabilities. These benchmarks were explicitly designed to be challenging, pushing AI systems to their limits across different tasks such as software engineering, research reproduction, and compute efficiency. Initial results showed rapid improvements, but it was uncertain whether these gains would sustain or plateau.
Recent data compiled by Thorsten Meyer indicates that all six benchmarks have now saturated or are on the verge of saturation, with improvement rates accelerating and occurring within a compressed timeline. This pattern suggests that the trajectory of AI capability growth is steeper than previously anticipated, with the potential for near-term breakthroughs in AI performance and autonomy.
“Every benchmark launched in 2023-2024 to measure AI R&D capability has either saturated or is tracking toward saturation on a timeline of months, not years.”
— Thorsten Meyer

Jetson Orin NX AI Development Module, System-on-Module, Nano Size, 8GB Memory @XYGStudy
Part Number: Jetson Orin NX 8GB
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Remaining Questions on Long-Term AI Progress and Impact
While the data shows rapid saturation of current benchmarks, it remains unclear whether this trend will continue as AI systems evolve beyond these specific tasks. It is also uncertain how these capabilities translate into real-world deployment, safety, and regulatory challenges. The potential for new benchmarks to emerge or for current ones to become obsolete is still under discussion among researchers.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Monitoring AI Capability Trajectories
Researchers and industry stakeholders should closely monitor upcoming benchmark results and real-world AI deployments to assess whether the saturation trend persists. Further analysis is needed to understand how these capabilities scale into broader AI systems and what implications this has for safety, governance, and workforce adaptation. Policy discussions are likely to intensify as AI systems approach or surpass human-level performance across multiple domains.

Midtronics MVT-100 Handheld Battery Tester, MDX-AI Powered, Fast & Accurate Diagnostics for 6V/12V Automotive, Marine, Power Sports, Heavy Duty Batteries, Rechargeable
Powered by MDX AI: Leverages data from hundreds of millions of battery tests performed globally by connected Midtronics…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What do the saturation of these benchmarks mean for AI safety?
Saturation indicates that AI systems are reaching high levels of performance in specific tasks, which could accelerate deployment but also raises safety and control concerns. Ongoing research is needed to ensure these capabilities are aligned with safety standards.
Are these benchmarks representative of real-world AI capabilities?
While these benchmarks measure specific skills, they may not fully capture the complexity of real-world AI deployment. However, their rapid saturation suggests a trend toward more capable AI systems overall.
Will new benchmarks emerge as AI progresses?
It is possible. As current benchmarks saturate, researchers may develop new, more challenging metrics to push AI capabilities further, but the current pattern indicates rapid progress across existing measures.
How soon could AI systems surpass human performance in broader tasks?
Given the rapid saturation of current benchmarks, some experts suggest that AI could reach or exceed human-level performance in various domains within the next few years, though this depends on many factors including safety and deployment considerations.
Source: ThorstenMeyerAI.com