Can AI Can Accelerate Scientific Research?

by Corin Wagen · Apr 2, 2025

Update: This paper has been retracted. We are leaving this blog post up for historical reasons, but the data presented here should not be viewed as trustworthy or authoritative. Read the full statement from MIT: https://economics.mit.edu/news/assuring-accurate-research-record.

Despite the current ubiquity of artificial intelligence, many scientists remain skeptical. Generative AI models have made waves in image and language domains, but their relevance to real-world research often seems unclear. Can these models really help with something as complex and domain-specific as chemical discovery?

A new study by Aidan Toner-Rodgers at MIT provides a rigorous, large-scale evaluation of the impact of machine learning on scientific research. Toner-Rodgers evaluates what happened when a large U.S. industrial R&D lab introduced an AI-powered materials discovery tool to over a thousand researchers. The results are surprisingly clear: AI can dramatically accelerate innovation, but only for scientists with the domain expertise to guide it.

Metrics improve over time after the AI model is introduced.

Aiden Toner-Rodgers, an MIT economics Ph.D. student.

The Effect of AI on Materials R&D

The AI tool studied in Toner-Rodgers's work was a graph neural network (GNN)-based diffusion model trained to generate candidate materials that were predicted to have specific properties. In this study, the researchers used it for inverse design—providing target features and receiving plausible structures in return. The company rolled the model out in waves across 1,018 scientists, allowing for a controlled and large-scale study of AI's impact over almost two years. (The exact nature of the company and the model are, sadly, confidential.)

The results are striking. Researchers who gained access to the model discovered 44% more materials, filed 39% more patents, and produced 17% more product prototypes. Adoption of the AI model leads to a clear step change in materials discovery and patent filings after about six months, while the increase in product prototypes took over a year to appear. This makes sense, as prototypes are downstream of patents and new materials.

Metrics improve over time after the AI model is introduced.

Figure 5 from Toner-Rodgers's paper, showing the impact of introducing the AI model over time.

These graphs don't just show a flood of "AI slop" overrunning the materials discovery pipeline—as far as Toner-Rodgers can quantify, the discoveries were also better than human-only discoveries. The materials were superior in quality (as assessed by similarity to the researchers' desired properties) and showed significantly greater novelty, both structurally and in downstream patents. For instance, patents filed by AI-assisted scientists used more novel technical terminology, an early marker of transformative innovation.

One concern with applying ML to scientific domains is the so-called "streetlight effect": the idea that models might just guide us toward what we already know and disfavor truly novel research. But in this case, AI-enabled teams produced more distinct materials and more new product lines, not just incremental tweaks. This suggests that the model actually helped researchers explore new territory in materials design space, although a full treatment of this question will require further research.

Although the full effect of incorporating AI took a substantial amount of time, the effect on the organization was substantial. Overall, Toner-Rodgers estimates that introducing this single AI model improved overall R&D efficiency by 13–15%, even after model training costs are taken into account. This productivity boost would be extraordinary in any company, let alone an organization with over a thousand researchers.

AI Shifts Scientific Bottlenecks

Scientists' task logs also showed a dramatic reallocation of effort: AI automated about 57% of the idea-generation process, freeing researchers to focus on evaluating and testing candidate materials—areas where domain knowledge is essential.

Introduction of the AI model means researchers spend less time generating ideas.

Figure 8 from Toner-Rodgers's paper, showing the impact of introducing the AI model on researcher activities.

Here's how Toner-Rodgers summarizes this finding:

While [AI] replaces labor in the specific activity of designing compounds, it augments labor in the broader discovery process due to its complementarity with evaluation tasks.

This phenomenon is perhaps unsurprising: the ML model studied here was capable of generating new candidate materials, so less human time was spent generating candidates and more time was allocated to evaluation these candidates. It's interesting to imagine what might happen if a second ML model capable of candidate evaluation were added—might it be possible to produce compounding productivity increases?

Human Expertise Still Matters

Critically, introduction of the ML model didn't help all of the scientists equally. The top third of scientists nearly doubled their output, while the bottom third saw little change. This can be traced to differences in how these scientists employed the models. The best performers used their domain expertise to filter the flood of model-suggested candidates, avoiding time sinks on unstable or irrelevant compounds. In contrast, less experienced users often tested the model's suggestions at random—burning resources on dead ends.

This divide can be further studied by examining which forms of expertise proved most useful for judging candidate materials generated by AI. Scientific training proved to be the most important, followed by previous in-field experience and raw intuition, while experience with other ML tools proved unimportant. Paradoxically, this implies that the advent of AI tools makes domain knowledge and scientific intuition more important, not less.

Domain expertise and experience predict efficacy of the AI model.

Figure 12 from Toner-Rodgers's paper, showing which forms of expertise proved useful in working with the AI model.

This finding complements earlier work suggesting that while machine prediction is improving rapidly, human evaluation and decision-making are still critical to success. One of the study's most striking findings is that "only scientists with sufficient expertise can harness the power of AI." The need for scientific thinking hasn't disappeared; it's simply shifted downstream to judgment and interpretation.

What About Real-World Impact?

Of course, increased patents and prototypes don't automatically mean real-world success. Still, there are reasons to take these results seriously: patent filings require novelty, utility, & non-obviousness, and product prototypes represent a substantial corporate expenditure and a degree of human validation & trust. We won't know the full impact of these discoveries for years—but for those trying to assess whether AI can unlock new chemical space, this study represents unusually rigorous evidence that it can.

Unfortunately, not all impacts were positive. In follow-up surveys, 82% of scientists reported reduced job satisfaction. Even those who benefited the most cited skill underutilization and a decline in creativity as top concerns. AI may accelerate discovery, but many researchers felt alienated by the new workflow. Still, scientists' belief in AI's productivity-enhancing potential nearly doubled after using the model. The vast majority reported plans to reskill, anticipating a future in which the traits needed to excel in scientific research will shift.

Conclusions

This study offers some of the clearest empirical evidence to date that AI can accelerate real-world scientific discovery, especially in chemistry and materials science. But it also highlights an important nuance. AI doesn't replace scientific experts—instead, it makes them exponentially more valuable.

At Rowan, we're building the ML-native design and simulation platform for chemistry, drug discovery, and materials science. We believe that the future of scientific discovery requires both humans and AI models working in tandem, as described in Toner-Rodgers's paper, and are working to build software to make this transition possible. If you're a scientist looking to excel in the age of AI-powered research, come check out what we're building!

Banner background image

What to Read Next

Studying Scaling in Electron-Affinity Predictions

Studying Scaling in Electron-Affinity Predictions

Testing low-cost computational methods to see if they get the expected scaling effects right.
Sep 10, 2025 · Corin Wagen
Open-Source Projects We Wish Existed

Open-Source Projects We Wish Existed

The lacunæ we've identified in computational chemistry and suggestions for future work.
Sep 9, 2025 · Corin Wagen, Jonathon Vandezande, and Ari Wagen
How to Make a Great Open-Source Scientific Project

How to Make a Great Open-Source Scientific Project

Guidelines for building great open-source scientific-software projects.
Sep 9, 2025 · Jonathon Vandezande
ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

the promise & peril of solubility prediction; our approach and models; pH-dependent solubility; testing NNPs for redox potentials; benchmarking opt. methods + NNPs; an FSM case study; intern farewell
Sep 5, 2025 · Eli Mann, Corin Wagen, and Ari Wagen
Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design.
Sep 5, 2025 · Elias L. Mann, Corin C. Wagen
What Isaiah and Sawyer Learned This Summer

What Isaiah and Sawyer Learned This Summer

Reflections from our other two interns on their time at Rowan and what they learned.
Sep 5, 2025 · Isaiah Sippel and Sawyer VanZanten
Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

We evaluate the ability of neural network potentials (NNPs) trained on OMol25 to predict experimental reduction-potential and electron-affinity values for a variety of main-group and organometallic species.
Sep 4, 2025 · Sawyer VanZanten, Corin C. Wagen
Which Optimizer Should You Use With NNPs?

Which Optimizer Should You Use With NNPs?

The results of optimizing 25 drug-like molecules with each combination of four optimizers (Sella, geomeTRIC, and ASE's implementations of FIRE and L-BFGS) and four NNPs (OrbMol, OMol25's eSEN Conserving Small, AIMNet2, and Egret-1) & GFN2-xTB.
Sep 4, 2025 · Ari Wagen and Corin Wagen
Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

finding transition states; the freezing-string method; using Rowan to find cool transition states; discussing drug design
Sep 3, 2025 · Jonathon Vandezande, Ari Wagen, Spencer Schneider, and Corin Wagen
The Invisible Work of Computer-Assisted Drug Design

The Invisible Work of Computer-Assisted Drug Design

Everything that happens before the actual designing of drugs, and how Rowan tries to help.
Aug 28, 2025 · Corin Wagen