Joeran Beel, Min-Yen Kan, and Moritz Baumgart. 2025. Evaluating Sakana’s AI Scientist for Autonomous Research: Wishful Thinking or an Emerging Reality Towards ‘Artificial Research Intelligence’ (ARI)?. arXiv:2502.14297.
Full-text: https://isg.beel.org/pubs/2025-sakana-ai-scientist-reproduced.pdf
Abstract. Recently, Sakana.ai introduced the AI Scientist, which claims to automate the entire research lifecycle and conduct research autonomously. This is a concept we call Artificial Research Intelligence (ARI). Achieving ARI would be a major milestone toward Artificial General Intelligence (AGI) and a prerequisite to achieving Super Intelligence. The AI Scientist received much attention in the academic and broader AI community. However, a thorough evaluation of the AI Scientist had not yet been conducted.
We evaluated the AI Scientist and found several critical shortcomings. The system’s literature review process is inadequate, relying on simplistic keyword searches rather than profound synthesis, which leads to poor novelty assessments. In our experiments, many generated research ideas were incorrectly classified as novel, including well-established concepts such as micro-batching for stochastic gradient descent (SGD). The AI Scientist also lacks robustness in experiment execution—five out of twelve proposed experiments (42%) failed due to coding errors, and those that did run often produced logically flawed or misleading results. In one case, an experiment designed to optimize energy efficiency reported improvements in accuracy while consuming more computational resources, contradicting its stated goal. Furthermore, the system modifies experimental code minimally, with each iteration adding only 8% more characters on average, suggesting limited adaptability. The generated manuscripts were poorly substantiated, with a median of just five citations per paper — most of which were outdated (only five out of 34 citations were from 2020 or later). Structural errors, including missing figures, repeated sections, and placeholder text such as “Conclusions Here”, were frequent. Hallucinated numerical results were contained in several manuscripts, undermining the reliability of its outputs.
Despite its limitations, the AI Scientist represents a significant leap forward in research automation. It produces complete research manuscripts with minimal human intervention, challenging conventional expectations of AI-generated scientific work. Many reviewers or university instructors conducting only a superficial assessment may struggle to distinguish its output from that of human researchers or students, demonstrating how far AI has progressed in mimicking academic writing and structuring scientific arguments. While the quality of its manuscripts aligns with that of an unmotivated undergraduate student rushing to meet a deadline, this level of autonomy in research generation is remarkable. More strikingly, it achieves this at an unprecedented speed and cost efficiency—our analysis indicates that generating a full research paper costs only $6–$15, with just 3.5 hours of human involvement. This is significantly faster than traditional human researchers. Given that AI research automation was nearly nonexistent just a few years ago, the AI Scientist marks a substantial milestone toward Artificial Research Intelligence (ARI), signalling the acceleration of AI-driven scientific discovery.
The AI Scientist also illustrates the urgent need for a discussion within the Information Retrieval (IR) and broader scientific communities. Whether and when ARI becomes a reality depends on how the academic and AI communities shape its development and governance. We propose concrete steps, including pilot projects and competitions, and standardized attribution frameworks such as research logs and markup languages.
Continue to read the full pre-print (PDF): https://isg.beel.org/pubs/2025-sakana-ai-scientist-reproduced.pdf
0 Comments