Following my recent visit to Prof. Min-Yen Kan at the National University of Singapore, Moritz Baumgart, Min-Yen Kan and I co-authored a paper critically evaluating Sakana’s AI Scientist (to be published in SIGIR Forum). Our review attracted the attention of Nature, which was preparing an editorial feature on the opportunities and risks of AI-driven research. As part of this, we had a one-hour conversation with Nature journalist Ananya. I am pleased that some of our reflections were included in the final article.
In our recent evaluation of Sakana’s AI Scientist, Min-Yen Kan, Moritz Baumgart, and I found a system that is both flawed and remarkable. On the one hand, its novelty detection relies on superficial keyword searches, often misclassifying well-established techniques like micro-batching as original contributions. On the other hand, the AI Scientist is capable of generating complete manuscripts at unprecedented speed and cost — around $6–15 per paper with only a few hours of human oversight. As we argued in Nature, this is not hype alone: the system represents a milestone toward what we termed Artificial Research Intelligence (ARI).
The Nature feature highlighted exactly this duality: the promise of scaling up discovery, and the risks of eroding intellectual credit. Some researchers, such as Byeongjun Park, described cases where AI-generated manuscripts closely resembled their own methods, without being identical enough to count as plagiarism. Others, including Tarun Gupta and Danish Pruthi, presented systematic analyses showing that up to a third of tested AI-generated papers recycled prior ideas without attribution. Their work even won an award at ACL, underscoring the seriousness with which the community views this issue. At the same time, the team behind Sakana’s AI Scientist pushed back, arguing that many of the uncited overlaps amounted to omissions common among human authors.
This is not only an academic question. Automated novelty checks, which rely on keyword searches and citation counts, miss the tacit knowledge and nuanced judgment that human experts bring. As I argued in Nature, reducing the concept of originality to algorithmic search is simplistic. Search engines may overlook relevant work, and semantic similarity tools remain far from adequate for assessing overlap at the level of ideas . These limitations are reminiscent of my earlier research on Google Scholar’s ranking algorithm, where we showed how strongly keyword placement and citation counts shaped visibility, often in ways that had little to do with true scholarly merit . The parallels underline how fragile algorithmic assessments of novelty and relevance can be — whether in search engines or in AI-driven science.
The article also gave space to broader voices. Debora Weber-Wulff, a plagiarism researcher in Berlin, emphasized that while textual plagiarism can often be detected, “idea plagiarism” is almost impossible to prove — and AI will only worsen this. Other experts pointed out that even without AI, computer science already faces a flood of publications, making it difficult to assess novelty at scale. With AI now generating research en masse, the risk of diluted originality becomes systemic. These observations closely align with the concerns that Min-Yen Kan and I raised in our joint paper: that keyword-driven retrieval and superficial synthesis are inadequate foundations for assessing novelty.
The parallels to my 15-year-old research on Academic Search Engine Optimisation (ASEO) are striking. Fifteen years ago, with Bela Gipp and Erik Wilde, I showed how Google Scholar could be gamed with invisible text, citation manipulation, and even autogenerated nonsense papers. Back then, the worry was that ASEO might slip into academic search engine spam. Today, Sakana’s AI Scientist shows us the next step: not authors manipulating search engines, but AI systems remixing existing knowledge and presenting it as new. The scale is larger, the technology more advanced, but the underlying concern — that algorithmic shortcuts can distort academic credit — remains constant.
Together with Min-Yen Kan and Moritz Baumgart, I concluded that the AI Scientist should not be dismissed as a gimmick. Its current quality may resemble that of an undergraduate student rushing an assignment, but its autonomy and efficiency mark genuine progress. The real question is not whether AI-generated research should exist — it will. The challenge is how to use such tools responsibly. Should we treat AI as a co-author, an assistant, or merely an inspiration? How do we ensure transparency about what is borrowed and what is genuinely new? And how do we adapt peer review, plagiarism detection, and academic norms to a world where machines write science at scale?
Looking forward, Min, Moritz, and I believe that our task is not to resist AI-driven science but to build frameworks that preserve integrity while enabling progress. Structured research logs, attribution markup, and benchmarking competitions are one path forward. If we succeed, Sakana’s AI Scientist and its successors will not only accelerate discovery but do so in a way that maintains the very values that make science credible.
As the Nature piece made clear, the community is divided: some fear plagiarism, some see only incremental overlaps, and some are optimistic about future breakthroughs. But in all cases, the debate confirms what our own research concluded: AI-driven research is here, it is powerful, and it must be guided by robust academic standards. Sakana’s AI Scientist may be imperfect, but it is undeniably a very promising step in the right direction.

0 Comments