The Invisible Work of Computer-Assisted Drug Design

by Corin Wagen · Aug 28, 2025

Workers breaking stones with hammers.

The Stone Breakers, Gustave Courbet (1849)

Scientists who work in computer-assisted drug discovery (CADD) must be comfortable with a vast variety of skills. In modern drug-design organizations, CADD scientists are responsible for a large and ever-growing list of responsibilities:

The diversity of software tools required for state-of-the-art computational drug design means that scientists often spend a surprising fraction of their time away from actual drug design. Jim Snyder, a "world-class modeler and scientist" with "high scientific success in academia and industry" (per Ash Jogalekar), wrote a fascinating overview of the state of computer-assisted drug design in the 1980s. Here's what he wrote about this particular topic (emphasis added):

On the invisible side of the ledger—about 30-50% of the group’s time—is the effort that permits the CADD group to maintain state-of-the-art status. In the current late 1980s-early 1990s environment, major software packages often incorporating new methodology are generally purchased from commercial vendors. These are now generally second or third generation, sophisticated and expensive ($50,000–150,000). Still, no commercial house can anticipate all the needs of a given applications’ environment. It remains necessary to treat problems specific to a given research project and to locally extend known methodology. This means that new capabilities delivered in advanced versions of commercial software need careful evaluation.

Although Snyder was writing about the 1980s, his observations are no less true today. Commercial software solutions must be evaluated, benchmarked, and tested on internal data—a process which is slow and time-consuming. The problem is even worse for academic code, whose authors often have little experience with industry use cases or conventional software practices.

The rise of machine learning has made the work of benchmarking and internal validation even more important, particularly as public benchmarks become contaminated by data leakage and overfitting. A recent study benchmarking DiffDock by Ajay Jain, Ann Cleves, and Pat Walters discusses the time and effort that the CADD community collectively spends benchmarking new methods (emphasis added):

Publication of studies such as the DiffDock report are not cost-free to the CADD field. Magical sounding claims generate interest and take time for groups to investigate and debunk. Many groups must independently test and understand the validity of such claims. This is because most groups, certainly those focused primarily on developing new drugs, do not have the time to publish extensive rebuttals such as this. Therefore their effort in validation/debunking is replicated many fold. The waste of time and effort is substantial, and the process of drug discovery is difficult enough without additional unnecessary challenges.

Even when authors report high-quality benchmarks and clearly disclose when a method will and won't work, considerable work remains before a given method can be integrated into production CADD usage. Most scientific tasks require more than a single computation or model-inference step, necessitating integration into a larger software ecosystem. (I wrote about this in the context of ML-powered workflows previously.) Building this state-of-the-art software infrastructure can still be challenging, as Snyder describes (emphasis added):

No single piece of software is ordinarily sufficient to address a routine but multistep modeling task. For example, conformation generation, optimization, and least-squares fitting can involve three separate computer programs. The XYZ coordinate output from the first is the input for the second; output from the latter is input for the third. With an evolving library of 40-50 active codes, the task of assuring comprehensive and smooth coordinate interconversion is a demanding and ongoing one.

Most scientific software tools don't make integration easy. Modern packaging and code-deployment processes are rarely followed in science, forcing the CADD practitioner to go through the painful and time-consuming task of manually creating a minimal environment capable of running a given model or algorithm.

For methods requiring specialized hardware like GPUs, things become still more complex—and some modern methods, like protein–ligand co-folding, require external resources like a MSA server which must be provisioned, creating additional opportunities for failure. Solving all these issues requires CADD scientists to essentially become "ML DevOps" experts, a skillset which most do not naturally have.

Building tools to run calculations is only half the problem. To be impactful, CADD scientists must also integrate their predictions into the experimental design–make–test–analyze cycle, which necessitates communicating results with medicinal chemists. Many large pharmaceutical companies have invested in building some sort of internal graphical platform to simplify communication and allow scientists across the organization to run and view calculations, but these platforms are often costly to maintain and accumulate technical debt quickly. (We've talked to a lot of teams that had a fantastic internal platform for running calculations until the maintainer switched roles and left the platform to die a slow and ignominious death.)

At Rowan, we're working to build a CADD platform that addresses all these issues. Our goal is to help scientists stop worrying about software issues and free them up to focus on their science, helping to cut down on the amount of invisible work that goes into CADD and letting our users do what their good at. Here's what we do:

Building a top-tier CADD team used to mean spending millions on software licenses and developers to build a bespoke internal platform; with Rowan, we're building this platform for all our customers. If you'd like to be one of them, make an account or reach out to our team!

Banner background image

What to Read Next

The Invisible Work of Computer-Assisted Drug Design

The Invisible Work of Computer-Assisted Drug Design

Everything that happens before the actual designing of drugs, and how Rowan tries to help.
Aug 28, 2025 · Corin Wagen
MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA-related occurrences and our incident postmortem; MSA server coming soon; exploring new approaches to binding-affinity prediction; a farewell to interns; a new stereochemistry lab
Aug 22, 2025 · Ari Wagen, Corin Wagen, and Spencer Schneider
Co-Folding Failures, Our Response, and Rowan-Hosted MSA

Co-Folding Failures, Our Response, and Rowan-Hosted MSA

A narrative account of our response to a sudden rise in protein–ligand co-folding failures.
Aug 22, 2025 · Ari Wagen
Exploring Protein–Ligand Binding-Affinity Prediction

Exploring Protein–Ligand Binding-Affinity Prediction

Trying a few modern ML-based approaches for predicting protein–ligand binding affinity.
Aug 20, 2025 · Ishaan Ganti
What Ishaan and Vedant Learned This Summer

What Ishaan and Vedant Learned This Summer

Reflections from two of our interns on their time at Rowan and a few things they learned.
Aug 15, 2025 · Ishaan Ganti and Vedant Nilabh
Projects: Organization, Sharing, and Saving Structures

Projects: Organization, Sharing, and Saving Structures

better organization through projects; saving structures; usage tracking; new conf. search features; second-order SCF; ex. API repo; SMILES imports; a guide to the pKa-perplexed; our inaugural demo day
Aug 14, 2025 · Ari Wagen, Spencer Schneider, Corin Wagen, and Jonathon Vandezande
Macroscopic and Microscopic pKa

Macroscopic and Microscopic pKa

Two different ways to calculate acidity, what they mean, and when to use them.
Aug 11, 2025 · Corin Wagen
Computational Chemistry in the Classroom

Computational Chemistry in the Classroom

chemical modeling; Diels–Alder; call for more labs
Jul 31, 2025 · Jonathon Vandezande and Isaiah Sippel
Modeling Thia-Michael Reactions

Modeling Thia-Michael Reactions

In which the addition of a thiolate to an enone proves to be unexpectedly difficult to model.
Jul 25, 2025 · Corin Wagen
API v2, New BDE Methods, MCP, And More

API v2, New BDE Methods, MCP, And More

new API philosophy; streamlined interfaces for workflows; using NNPs and g-xTB to predict bond strength; an MCP server; .sdf files; benchmarking protein–ligand interactions; Diels–Alder visualizations
Jul 21, 2025 · Spencer Schneider, Corin Wagen, Ari Wagen, Jonathon Vandezande, Ishaan Ganti, and Isaiah Sippel