close
close

A new “AI scientist” can write scientific papers without human intervention. That’s why this is a problem

A new “AI scientist” can write scientific papers without human intervention. That’s why this is a problem

Scientific discovery is one of the most demanding human activities. First, scientists must understand existing knowledge and identify a significant gap in knowledge. Next, they must formulate a research question and design and conduct an experiment to answer that question. Then, they must analyze and interpret the results of the experiment, which may generate another research question.

Can such a complex process be automated? Last week, Sakana AI Labs announced the creation of an “AI scientist” – an artificial intelligence system that they say can make scientific discoveries in the field of machine learning fully automatically.

Using generative large language models (LLMs) like those that underpin ChatGPT and other AI chatbots, the system can brainstorm, select a promising idea, program new algorithms, present results, and write a paper summarizing the experiment and its findings, complete with references. Sakana claims the AI ​​tool can run the entire lifecycle of a scientific experiment at a cost of just $15 per paper—less than the cost of a scientist’s lunch.

These are big claims. Are they valid? And even if they are, would an army of AI scientists producing research reports at inhuman speed really be good news for science?

How a computer can “do science”

Much of science is conducted in the public domain, and almost all scientific knowledge has been written down somewhere (otherwise we would have no way of “knowing” it). Millions of scientific papers are available for free online in repositories such as arXiv and PubMed.

LLMs trained on this data grasp the language of science and its patterns, so perhaps it’s not at all surprising that a generative LLM can produce something that looks like a good scientific paper—it’s picked up plenty of examples to copy.

What is less clear is whether an AI system is capable of interesting scientific work. The crucial point is that good science requires novelty.

But is it interesting?

Scientists do not want to learn about things that are already known. Rather, they want to learn new things, especially new things that are significantly different from what is already known. This requires an assessment of the scope and value of a contribution.

The Sakana system attempts to determine interestingness in two ways. First, it “scores” new paper ideas based on similarities to existing research papers (which are indexed in the Semantic Scholar repository). Anything that is too similar is discarded.

Second, Sakana’s system introduces a “peer review” step – another LLM assesses the quality and novelty of the paper produced. Again, there are many examples of peer review online at sites such as openreview.net that can serve as a guide for critiquing a paper. LLMs have adopted these as well.

AI may not be a good indicator of AI performance

The response to Sakana AI’s results has been mixed, with some describing the work as “endless scientific nonsense.”

Even the system’s own review of the results concludes that the results are weak at best. This will likely improve as technology advances, but the question of whether automated scientific work is valuable remains.

The ability of LLMs to assess the quality of research is also an open question. My own work (soon to be published in Research Synthesis Methods) shows that LLMs are not particularly good at assessing the risk of bias in medical research studies, although this too may improve over time.

Sakana’s system automates discoveries in computational research, which is much easier than in other branches of science that require physical experiments. Sakana’s experiments are conducted using code, which is also structured text that LLMs can learn to generate.

AI tools should support scientists, not replace them

AI researchers have been developing systems to support science for decades. Given the enormous amount of published research, it can be difficult to even find publications relevant to a specific scientific question.

Specialized search tools use AI to help researchers find and summarize existing work. These include the Semantic Scholar mentioned above, but also newer systems such as Elicit, Research Rabbit, scite, and Consensus.

Text mining tools like PubTator drill deeper into documents to identify key areas of focus, such as specific genetic mutations and diseases and their proven associations. This is particularly useful for curating and organizing scientific information.

Machine learning is also used to support the synthesis and analysis of medical evidence in tools such as Robot Reviewer, and summaries that compare and contrast claims in Scholarcy articles help conduct literature searches.

All of these tools aim to help scientists do their work more efficiently, not to replace them.

AI research could exacerbate existing problems

Sakana AI says the role of human scientists will not diminish, but the company’s vision of a “fully AI-driven scientific ecosystem” would have profound implications for science.

One concern is that if AI-generated papers flood the scientific literature, future AI systems will be trained on AI results and will experience model collapse, meaning they could become increasingly ineffective at innovating.

However, the consequences for science go far beyond the impact on the AI ​​science systems themselves.

There are already criminal actors in science, including “paper mills” that produce fake publications. This problem will only get worse if you can produce a scientific publication with $15 and a vague starting point.

The need to check a mountain of automatically generated research for errors could quickly overwhelm the capacity of real scientists. The peer review system is arguably already broken, and feeding more research of dubious quality into the system will not fix it.

Science is fundamentally based on trust. Scientists value the integrity of the scientific process so that we can trust that our understanding of the world (and now the machines of that world) is valid and improving.

A scientific ecosystem in which AI systems play a central role raises fundamental questions about the meaning and value of this process and how much trust we should place in AI scientists. Is this the kind of scientific ecosystem we want?


Leave a Reply

Your email address will not be published. Required fields are marked *