AI reviewers are here — we are not ready
Article Date: 03 December 2025
Article URL: https://www.nature.com/articles/d41586-025-03909-5
Article Image: https://media.nature.com/w300/magazine-assets/d41586-025-03909-5/d41586-025-03909-5_51762828.jpg
Summary
Nature commentary by Giorgio F. Gilestro examines the rapid introduction of AI-driven peer-review tools — notably an AI reviewer integrated into preprint servers by openRxiv using q.e.d Science — and warns the scientific community that we are unprepared for the consequences. The piece contrasts the clear operational benefits of large language models (LLMs) — speed, consistency, automated checks for statistics, plagiarism and citations — with substantial intellectual risks, such as regression to the mean, loss of diverse viewpoints, failure to spot genuine paradigm-shifting results, and new avenues for gaming the system. The author argues for careful validation, transparency and human oversight before delegating substantive judgement to algorithms.
Key Points
- openRxiv has begun integrating an AI reviewing tool (from q.e.d Science) that delivers rapid feedback on preprints, often within 30 minutes.
- LLMs can efficiently check statistics, spot plagiarism and verify citations, which could free human reviewers for higher-value tasks.
- AI reviewers tend to produce an “average” assessment, collapsing diversity of expert opinion and risking regression to the mean.
- There is a real danger that AI will miss or suppress anomalous, paradigm-challenging work that humans might recognise.
- Empirical work (2024 study using GPT-4) shows LLMs predict typical reviewer comments well — useful but limited.
- Deploying AI reviewers without safeguards introduces intellectual risks and opportunities for manipulation; transparent validation and hybrid models are needed.
Content summary
The article opens by describing a provocation: preprint servers experimenting with automated AI review to provide fast, polite feedback. The benefits are obvious — speed, routine checks, reduced delays — but the author stresses that speed is not the same as validity. Human peer review serves two roles: validating routine, methodical science and spotting true novelty that challenges existing frameworks. AI performs well at the former tasks but struggles with the latter, tending to flatten assessments into an average and potentially overlooking breakthroughs. The author cites a study showing GPT-4 can emulate average reviewer comments and warns that overreliance could conserve human attention poorly, misallocate trust and enable gaming. The piece concludes by urging the community to “review the reviewer”: insist on transparency about how AI systems work, require validation studies, preserve human judgement for high-stakes decisions, and design hybrid workflows that combine automated checks with expert appraisal.
Context and relevance
This article is important for researchers, journal editors, funders and anyone involved in scholarly communication. As AI tools become embedded in editorial pipelines and preprint platforms, their impact will ripple across how manuscripts are assessed, what gets funded, and how credit is assigned. The debate intersects major trends: automation in academic workflows, reproducibility and research integrity, and the ethics of delegating judgement to opaque models. The piece connects to earlier concerns about AI in peer review and urges proactive governance rather than reactive fixes.
Author style
Punchy: the author is direct and cautionary — this is not a subtle technical footnote but a systemic risk. If you care about the integrity of publishing, the piece underlines why we must treat AI reviewers as tools that need scrutiny, not as replacements for judgement.
Why should I read this?
Short version: if you publish, referee, edit or fund research, this affects you. Fast AI reviews sound dreamy — no more months of waiting — but they can mop up routine chores while muffling the kind of oddball findings that change fields. Read it to know what to push back on: transparency, validation and keeping humans in the loop.
