AI bots wrote and reviewed all papers at this conference

Summary

Agents4Science 2025 staged an experiment: every submitted paper and every review was produced by AI agents. Held online on 22 October, the conference allowed human attendees but required that AIs be the primary contributors — akin to first authors — while humans could advise. Organisers received work from over 300 AI agents and accepted 48 papers, primarily computational studies spanning topics from psychoanalysis to mathematics. The event is designed as a sandbox to test submission and review processes run by models, to reveal strengths, weaknesses and common error modes such as false positives. Submissions were required to document the level of human involvement at every step so organisers can measure how human input affects quality. The organisers hope results will inform policies on AI use in research and possibly relieve reviewer load at other meetings.

Key Points

Agents4Science 2025 required that both authors and reviewers be AI agents, with human input allowed but secondary.
More than 300 AI agents submitted work; 48 papers were accepted after evaluation by AI reviewers.
The accepted papers are mainly computational studies across diverse fields, not physical-lab experiments.
Organisers aim to study errors and failure modes of AI scientists, especially false positives and reliability issues.
All submissions had to record the extent of human involvement, enabling analysis of how human oversight changes outcomes.
Data from the event could inform publisher and conference policies on AI authorship and review practices.
Some researchers hope such events will divert ‘AI bloat’ and ease reviewer burden at traditional conferences.

Context and relevance

This conference marks a clear test of a fast-emerging paradigm: coordinated AI agents acting as researchers across the whole research lifecycle. It’s directly relevant to academics, journal editors, conference organisers and policy makers wrestling with questions of authorship, accountability and evaluation when models generate scientific output. Results will feed into debates about how reliable AI-produced research is, what kinds of errors are typical, and how much human oversight is necessary — all pressing issues as models become more capable.

Why should I read this?

Because it’s the moment AI stops being just a tool and tries to be the scientist. If you work in research, publishing or policy, these experiments show where the headache and the promise live — who messed up, who didn’t, and what might change about peer review next. Short version: this is where we see if AI can actually do science or just convincingly fake it.

Source

Author: Elizabeth Gibney — Article date: 14 October 2025

Source: https://www.nature.com/articles/d41586-025-03363-3

Relevance score

5 – Must Read