The Invisible Work of Computer-Assisted Drug Design

by Corin Wagen · Aug 28, 2025

Workers breaking stones with hammers.

The Stone Breakers, Gustave Courbet (1849)

Scientists who work in computer-assisted drug discovery (CADD) must be comfortable with a vast variety of skills. In modern drug-design organizations, CADD scientists are responsible for a large and ever-growing list of responsibilities:

The diversity of software tools required for state-of-the-art computational drug design means that scientists often spend a surprising fraction of their time away from actual drug design. Jim Snyder, a "world-class modeler and scientist" with "high scientific success in academia and industry" (per Ash Jogalekar), wrote a fascinating overview of the state of computer-assisted drug design in the 1980s. Here's what he wrote about this particular topic (emphasis added):

On the invisible side of the ledger—about 30-50% of the group’s time—is the effort that permits the CADD group to maintain state-of-the-art status. In the current late 1980s-early 1990s environment, major software packages often incorporating new methodology are generally purchased from commercial vendors. These are now generally second or third generation, sophisticated and expensive ($50,000–150,000). Still, no commercial house can anticipate all the needs of a given applications’ environment. It remains necessary to treat problems specific to a given research project and to locally extend known methodology. This means that new capabilities delivered in advanced versions of commercial software need careful evaluation.

Although Snyder was writing about the 1980s, his observations are no less true today. Commercial software solutions must be evaluated, benchmarked, and tested on internal data—a process which is slow and time-consuming. The problem is even worse for academic code, whose authors often have little experience with industry use cases or conventional software practices.

The rise of machine learning has made the work of benchmarking and internal validation even more important, particularly as public benchmarks become contaminated by data leakage and overfitting. A recent study benchmarking DiffDock by Ajay Jain, Ann Cleves, and Pat Walters discusses the time and effort that the CADD community collectively spends benchmarking new methods (emphasis added):

Publication of studies such as the DiffDock report are not cost-free to the CADD field. Magical sounding claims generate interest and take time for groups to investigate and debunk. Many groups must independently test and understand the validity of such claims. This is because most groups, certainly those focused primarily on developing new drugs, do not have the time to publish extensive rebuttals such as this. Therefore their effort in validation/debunking is replicated many fold. The waste of time and effort is substantial, and the process of drug discovery is difficult enough without additional unnecessary challenges.

Even when authors report high-quality benchmarks and clearly disclose when a method will and won't work, considerable work remains before a given method can be integrated into production CADD usage. Most scientific tasks require more than a single computation or model-inference step, necessitating integration into a larger software ecosystem. (I wrote about this in the context of ML-powered workflows previously.) Building this state-of-the-art software infrastructure can still be challenging, as Snyder describes (emphasis added):

No single piece of software is ordinarily sufficient to address a routine but multistep modeling task. For example, conformation generation, optimization, and least-squares fitting can involve three separate computer programs. The XYZ coordinate output from the first is the input for the second; output from the latter is input for the third. With an evolving library of 40-50 active codes, the task of assuring comprehensive and smooth coordinate interconversion is a demanding and ongoing one.

Most scientific software tools don't make integration easy. Modern packaging and code-deployment processes are rarely followed in science, forcing the CADD practitioner to go through the painful and time-consuming task of manually creating a minimal environment capable of running a given model or algorithm.

For methods requiring specialized hardware like GPUs, things become still more complex—and some modern methods, like protein–ligand co-folding, require external resources like a MSA server which must be provisioned, creating additional opportunities for failure. Solving all these issues requires CADD scientists to essentially become "ML DevOps" experts, a skillset which most do not naturally have.

Building tools to run calculations is only half the problem. To be impactful, CADD scientists must also integrate their predictions into the experimental design–make–test–analyze cycle, which necessitates communicating results with medicinal chemists. Many large pharmaceutical companies have invested in building some sort of internal graphical platform to simplify communication and allow scientists across the organization to run and view calculations, but these platforms are often costly to maintain and accumulate technical debt quickly. (We've talked to a lot of teams that had a fantastic internal platform for running calculations until the maintainer switched roles and left the platform to die a slow and ignominious death.)

At Rowan, we're working to build a CADD platform that addresses all these issues. Our goal is to help scientists stop worrying about software issues and free them up to focus on their science, helping to cut down on the amount of invisible work that goes into CADD and letting our users do what they're good at. Here's what we do:

Building a top-tier CADD team used to mean spending millions on software licenses and developers to build a bespoke internal platform; with Rowan, we're building this platform for all our customers. If you'd like to be one of them, make an account or reach out to our team!

Banner background image

What to Read Next

Batch Calculations Through Rowan's API

Batch Calculations Through Rowan's API

How to efficiently submit and analyze lots of workflows through Rowan's free Python API.
Dec 10, 2025 · Corin Wagen
Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Ari interviews Kat Yenko about her vision for BioArena, what led her to get started, and how to evaluate the utility of frontier models for real-world science.
Dec 9, 2025 · Ari Wagen
Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Corin talks with Daniil and Andrei about their recent seed round and how they plan to automate all of synthesis.
Dec 5, 2025 · Corin Wagen
Eliminating Imaginary Frequencies

Eliminating Imaginary Frequencies

How to get rid of pesky imaginary frequencies.
Dec 1, 2025 · Corin Wagen
Conformer Deduplication, Clustering, and Analytics

Conformer Deduplication, Clustering, and Analytics

deduplicating conformers with PRISM Pruner; Monte-Carlo-based conformer search; uploading conformer ensembles; clustering conformers to improve efficiency; better analytics on output ensembles
Nov 25, 2025 · Corin Wagen, Ari Wagen, and Jonathon Vandezande
The Multiple-Minimum Monte Carlo Method for Conformer Generation

The Multiple-Minimum Monte Carlo Method for Conformer Generation

Guest blog post from Nick Casetti discussing his new multiple-minimum Monte Carlo method for conformer generation.
Nov 24, 2025 · Nick Casetti
Screening Conformer Ensembles with PRISM Pruner

Screening Conformer Ensembles with PRISM Pruner

Guest blog post from Nicolò Tampellini, discussing efficient pruning of conformational ensembles using RMSD and moment of inertia metrics.
Nov 21, 2025 · Nicolò Tampellini
GPU-Accelerated DFT

GPU-Accelerated DFT

the power of modern GPU hardware; GPU4PySCF on Rowan; pricing changes coming in 2026; an interview with Navvye Anand from Bindwell; using Rowan to develop antibacterial PROTACs
Nov 19, 2025 · Jonathon Vandezande, Ari Wagen, Corin Wagen, and Spencer Schneider
Rowan Research Spotlight: Emilia Taylor

Rowan Research Spotlight: Emilia Taylor

Emilia's work on BacPROTACs and how virtual screening through Rowan can help.
Nov 19, 2025 · Corin Wagen
GPU-Accelerated DFT with GPU4PySCF

GPU-Accelerated DFT with GPU4PySCF

A brief history of GPU-accelerated DFT and a performance analysis of GPU4PySCF, Rowan's newest DFT engine.
Nov 19, 2025 · Jonathon Vandezande