Conventional Chemical Simulation Is Too Slow, and ML Can Help

by Corin Wagen · Dec 17, 2024

Without a background in computational chemistry, it's difficult to understand just how slow conventional high-accuracy chemical computations are. Simulating molecules or materials with high accuracy entails solving the electronic-structure problem, which requires simultaneously relaxing the positions of hundreds or thousands of electrons, each of which is delocalized throughout space and interacts with every other nucleus and electron. Exact solution to this problem become intractable for molecules with more than 5 or 10 atoms, and even dramatically approximated solutions like density-functional theory (DFT) are among the most taxing simulation problems in science.

But what does this mean in practice? We compiled a variety of resources about high-performance computing usage to quantify exactly how much time is being spent on this simulation problem, and the results are striking.

Biowulf (NIH)

Biowulf is the NIH's high-performance-computing cluster, which has over 90,000 CPU cores. In 2022 (the most recent year for which we have data), computational chemistry was the research area that used the most compute. Computational chemistry researchers ran 319 million CPU hours of computations, which amounts to 29% of all calculations run on the Biowulf cluster. (Unfortunately, the NIH doesn't break this data down by application, so it's difficult to know what programs are driving usage here.)

Hopper and Perlmutter (NESRC)

The National Energy Research Scientific Computing Center (NESRC), the primary high-performance computing source for the Department of Energy, studied the usage of their main Hopper cluster in 2012 and 2013. Although they didn't publish the underlying data, their analysis shows that materials science and chemistry used the 2nd and 4th most CPU hours of any scientific domain, respectively. Numerous electronic-structure-theory codes were among the most compute-intensive applications, including VASP and NWCHEM, and DFT was specifically highlighted as one of the most compute-intensive simulation tasks.

More recent NESRC studies have reached similar conclusions: a 2018 NESRC study found that DFT used more compute than any other application, with VASP alone taking up almost 20% of all computer time. A 2022 NESRC study of the new Perlmutter computing cluster also found DFT to be one of the most compute-intensive applications.

XSEDE (NSF)

A study of NSF high-performance-computing resources from 2011 to 2017 found that computational chemistry was one of the most compute-intensive applications. Many DFT codes like CP2K, Quantum ESPRESSO, ABINIT, and NWCHEM made the “top 40” list of most compute-hungry applications. In this survey, molecular-dynamics (MD) simulations consumed even more resources than DFT, with MD accounting for three of the top five applications (LAMMPS, NAMD, and GROMACS).

HPCI (Japan)

Japan's national High-Performance Computing Infrastructure (HPCI) reported that in 2022 research in “matter, material, and chemistry” accounted for 23% of all usage on their flagship Fugaku supercomputer and 33% of all usage on all other HPCI supercomputers. Materials science/chemistry also accounted for the most HPCI-related publications of any research area in 2022.

ARCHER2 (UK)

ARCHER2 is the UK's premier national supercomputing service. Andy Turner published an analysis of ARCHER2 usage in January 2022: DFT calculations accounted for the majority of all usage, with VASP alone comprising 42% of all ARCHER2 compute time. (A staggering 672,066 VASP calculations were run in January 2022!)


These data are all from government agencies; we don't have analogous usage information from industrial users or most academic clusters. Nevertheless, the conclusion is clear—computational chemistry is one of the toughest and most expensive simulation problems in all of science.

This might seem discouraging. Although there are many ways in which accurate simulation could accelerate drug discovery and materials science, today's computational methods are so expensive that they're already straining our high-performance-computing infrastructure. We hear from users and collaborators that it can take days or even weeks to get jobs to run on conventional high-performance computing clusters, which makes rapid iteration impossible.

Fortunately, a new wave of machine-learning-based approaches is making it possible to run atomistic simulation thousands or millions of times faster than traditional methods, with minimal loss in accuracy. Neural network potentials (NNPs) trained on legacy quantum mechanics simulations can recapitulate the results of DFT calculations in seconds, making it possible to run accurate workflows without spending thousands of dollars in compute time or waiting for days or weeks to get an answer.

At Rowan, we're working to build, test, and deploy this new paradigm of atomistic simulation software. We design benchmarks for NNPs, compare them rigorously to state-of-the-art DFT methods, and deploy high-performing NNPs onto our easy-to-use cloud platform. We've already launched two NNPs—AIMNet2 and OMat24—and are always working to make sure our users have access to the fastest and best simulations possible.

If you want to try out ML-accelerated computational chemistry, make an account today! And if you want to discuss how NNPs can accelerate research workflows in your company, reach out to our team for a custom consultation.

Banner background image

What to Read Next

Exploring Meta's Open Molecules 2025 (OMol25) & Universal Models for Atoms (UMA)

Exploring Meta's Open Molecules 2025 (OMol25) & Universal Models for Atoms (UMA)

A close look at the OMol25 dataset, the pre-trained eSEN and UMA models, and some thoughts about the future of NNP-accelerated atomistic simulation.
May 23, 2025 · Corin Wagen and Ari Wagen
Protein–Ligand Co-Folding

Protein–Ligand Co-Folding

folding vs co-folding; free open-source models; running Boltz-1 and Chai-1 through Rowan; decentralized data generation with Macrocosmos
May 9, 2025 · Spencer Schneider, Ari Wagen, and Corin Wagen
Rowan Research Spotlight: Turki Alturaifi

Rowan Research Spotlight: Turki Alturaifi

How Rowan helps researchers understand and optimize complex catalytic reactions.
May 7, 2025 · Corin Wagen
Partnering with Macrocosmos to Accelerate Next-Generation NNP Development

Partnering with Macrocosmos to Accelerate Next-Generation NNP Development

Starting today, Rowan is teaming up with Macrocosmos to accelerate the development of the next generation of NNPs through Bittensor Subnet 25 - Mainframe.
May 1, 2025 · Ari Wagen
Introducing Egret-1

Introducing Egret-1

trusting computation; speed vs accuracy; Egret-1, Egret-1e, and Egret-1t; benchmarks; speed on CPU and GPU; download Egret-1 or use it through Rowan
Apr 30, 2025 · Eli Mann, Corin Wagen, Jonathon Vandezande, Ari Wagen, and Spencer Schneider
Egret-1: Pretrained Neural Network Potentials For Efficient and Accurate Bioorganic Simulation

Egret-1: Pretrained Neural Network Potentials For Efficient and Accurate Bioorganic Simulation

Here, we present Egret-1, a family of large pre-trained NNPs based on the MACE architecture with general applicability to main-group, organic, and biomolecular chemistry.
Apr 30, 2025 · Elias L. Mann, Corin C. Wagen, Jonathon E. Vandezande, Arien M. Wagen, Spencer C. Schneider
Introducing Egret-1

Introducing Egret-1

Today, we're releasing Egret-1, a family of open-source NNPs for bioorganic simulation.
Apr 30, 2025 · Eli Mann, Corin Wagen, Jonathon Vandezande, Ari Wagen, and Spencer Schneider
Starling: Macroscopic pKa, logD, and Blood–Brain-Barrier Permeability

Starling: Macroscopic pKa, logD, and Blood–Brain-Barrier Permeability

microscopic vs. macroscopic pKa; Uni-pKa and Starling; microstate ensembles; logD and Kp,uu predictions
Apr 25, 2025 · Corin Wagen
Physics-Informed Machine Learning Enables Rapid Macroscopic pKa Prediction

Physics-Informed Machine Learning Enables Rapid Macroscopic pKa Prediction

Here we introduce Starling, a physics-informed neural network based on the Uni-pKa architecture trained to predict per-microstate free energies and compute macroscopic pKa values via thermodynamic ensemble modeling.
Apr 25, 2025 · Corin C. Wagen
Predicting Infrared Spectra and Orb-v3

Predicting Infrared Spectra and Orb-v3

light and its manifold interactions with matter; why IR spectroscopy is useful; predicting IR spectra through Rowan; Orb-v3
Apr 17, 2025 · Ari Wagen, Corin Wagen, and Jonathon Vandezande