Lyrebird: Molecular Conformer Ensemble Generation

by Eli Mann · Nov 5, 2025

This work was conducted in large part by Vedant Nilabh, a summer intern from Northeastern University. Thanks Vedant!

Most molecules can exist in different 3D shapes, called conformers. Each conformer is a local minima on the potential-energy surface and has an associated energy, which determines its population at a given temperature. The observed macroscopic behavior of a molecule typically arises in part from all relevant conformations, making proper conformer search and ranking an important part of almost all chemical simulation problems.

Unfortunately, finding all the conformers of a given molecule is very difficult. There are a variety of commonly used methods, each with its strengths and limitations. At Rowan, we've generally relied on two methods to date: ETKDG, a stochastic distance-geometry-based approach incorporating experimental torsional heuristics, and CREST, an iterative metadynamics-based approach that also incorporates a genetic-structure-crossing algorithm to increase diversity. While we've had great success with both of these methods (like many other groups), both have their problems—ETKDG is somewhat inaccurate, particularly for large and flexible molecules, and can fail in particularly complex cases, while CREST is extremely slow and often struggles to explore enough space in a reasonable time. As such, we've been on the lookout for alternative conformer-generation methods.

Lyrebird is our first foray into machine-learning-based conformer-generation algorithms. The Lyrebird architecture is based on the ET-Flow equivariant flow-matching architecture from Hassan et al. (preprint, GitHub). The model learns a vector field that transports samples from a harmonic prior (conditioned on the input SMILES) to the data distribution of 3D molecular conformers. In practice, it learns to map randomly initialized "noise" coordinates into realistic conformations. This is diffusion-like, but the dynamics are deterministic (an ODE) rather than stochastic (no Brownian noise term). The addition of equivariance makes the model respect the physical symmetries of molecules.

The ET-Flow architecture.

Figure 1 from the ET-Flow paper.

The original ET-Flow model was trained on a split of the GEOM-DRUGS subset of the GEOM dataset from Bombarelli et al., which contains over 317,000 ensembles of mid-sized drug-like organic molecules . Their studies show that the models perform well for molecules sampled within their training distributions, but poorly for molecules outside of their distribution. For Lyrebird, we increased the in-distribution samples by training on three datasets: GEOM-DRUGS; GEOM-QM9, a dataset with 133,258 small organic molecules limited to 9 heavy atoms; and CREMP, a dataset with 36,198 unique macrocyclic peptides. We hypothesized that increasing the diversity of the training dataset might lead to increased model generalizability, as well as improving the robustness of the model for routine chemical modeling tasks.

To test this hypothesis, we tested Lyrebird on Butina splits of GEOM-QM9, GEOM-DRUGS, and CREMP, as well as several challenging external sets: MPCONF196GEN, a small dataset containing conformers ensembles of the structures from MPCONF196, and GEOM-XL, a set of flexible organic compounds with up to 91 heavy atoms.

We evaluated our models against a variety of ML methods, as well as ETKDGv3, with metrics that evaluate both the diversity and geometric accuracy of a generated conformer ensemble. (We didn't benchmark against CREST because CREST was used to generate the training-data ensembles.) The metrics used for comparing conformer ensembles are a bit complex, because comparing two ensembles is a bit tricky, and merit specific explanation:

MethodRecall Coverage ↑ (Mean)Recall Coverage ↑ (Median)Recall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision Coverage ↑ (Mean)Precision Coverage ↑ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
Torsional Diffusion86.91100.000.200.1682.64100.000.240.22
ET-Flow87.02100.000.210.1471.7587.500.330.28
RDKit ETKDG87.99100.000.230.1890.82100.000.220.18
Lyrebird92.99100.000.100.0386.99100.000.160.05

Table 1: GEOM-QM9 test set results (threshold δ = 0.5 Å). Coverage in %, AMR in Å. Best results in bold.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG4.694.684.734.71
ET-Flow4.134.07>6>6
Lyrebird2.342.332.822.81

Table 2: CREMP test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG2.922.623.353.15
Torsional Diffusion2.051.862.942.78
ET-Flow2.311.933.312.84
Lyrebird2.422.073.272.87

Table 3: GEOM-XL test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG3.793.714.013.91
Torsional Diffusion2.712.583.132.95
ET-Flow2.603.332.833.59
Lyrebird2.542.962.803.56

Table 4: MPCONF196GEN test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.

We found that Lyrebird outperforms ETKDG, in terms of both precision and recall, on every precision/recall metric we studied. Versus other ML methods like Torsional Diffusion and ET-Flow, the results are a bit more mixed—Lyrebird performs better when there's more relevant training data (e.g. Tables 1 and 2), but doesn't in general seem to generalize significantly better for "difficult" benchmark sets like GEOM-XL (Table 3) or MPCONF196GEN (Table 4). In general, all methods seem quite poor on these sets (an RMSD of 2.5 Å hardly inspires confidence).

We're excited to list the Lyrebird model on Rowan today for all users. While it's not a massive improvement over the previous ET-Flow method in areas similar to the core GEOM-DRUGS dataset, we anticipate that the increased diversity of the training data will make Lyrebird more robust and generalizable across the variety of scientific areas that our users study. As people use this model more, we look forward to seeing how well it performs on real-life use cases, particularly in comparison to existing methods like ETKDG and CREST. We note that Lyrebird is a newly released model, and that results should be carefully checked for production use cases before being relied upon—we don't expect that Lyrebird will be as reliable as ETKDG or CREST yet.

In parallel with this launch, we're releasing the Lyrebird weights on GitHub under an MIT license, making it easy for users to run Lyrebird locally or as a part of different workflows. We're also releasing our new MPCONF196GEN benchmark set under an MIT license for other groups to use when benchmarking conformer-generation methods.

Banner background image

What to Read Next

Using Securely Generated MSAs to Run Boltz-2 and Chai-1

Using Securely Generated MSAs to Run Boltz-2 and Chai-1

Example scripts showing how Boltz-2 and Chai-1 can be run using MSA data from Rowan's MSA workflow.
Nov 5, 2025 · Spencer Schneider and Ari Wagen
Lyrebird: Molecular Conformer Ensemble Generation

Lyrebird: Molecular Conformer Ensemble Generation

Rowan's new flow-matching conformer-generation model, with benchmarks.
Nov 5, 2025 · Eli Mann
Predicting Ion-Mobility Mass Spectra Through Rowan

Predicting Ion-Mobility Mass Spectra Through Rowan

An introduction to the field, how Rowan's approach works, and where it might be useful.
Nov 5, 2025 · Corin Wagen
BREAKING: BoltzGen Now Live on Rowan

BREAKING: BoltzGen Now Live on Rowan

a new foray into generative protein-binder design; what makes BoltzGen different; experimental validation; democratizing tools; running BoltzGen on Rowan
Oct 27, 2025 · Corin Wagen, Ari Wagen, and Spencer Schneider
The "Charlotte's Web" of Density-Functional Theory

The "Charlotte's Web" of Density-Functional Theory

A layman's guide to cutting your way through the web of DFT functionals, explaining GGAs, mGGAs, hybrids, range-separated hybrids, double hybrids, and dispersion corrections.
Oct 27, 2025 · Jonathon Vandezande
How to Design Protein Binders with BoltzGen

How to Design Protein Binders with BoltzGen

Step-by-step guides on how to run the BoltzGen model locally and through Rowan's computational-chemistry platform.
Oct 27, 2025 · Corin Wagen and Ari Wagen
Pose-Analysis Molecular Dynamics and Non-Aqueous pKa

Pose-Analysis Molecular Dynamics and Non-Aqueous pKa

what to do after docking/co-folding; Rowan's approach to short MD simulations; what's next for SBDD and MD; new ML microscopic pKa models
Oct 23, 2025 · Corin Wagen, Ari Wagen, Eli Mann, and Spencer Schneider
How to Predict pKa

How to Predict pKa

Five different theoretical approaches for acidity modeling and when you should use each one.
Oct 16, 2025 · Corin Wagen
Structure-Based Drug Design Updates

Structure-Based Drug Design Updates

enforcing stereochemistry; refining co-folding poses; running PoseBusters everywhere; computing strain for co-folding; PDB sequence input; 3D visualization of 2D scans
Oct 14, 2025 · Ari Wagen and Corin Wagen
Using Implicit Solvent With Neural Network Potentials

Using Implicit Solvent With Neural Network Potentials

Modeling polar two-electron reactivity accurately with neural network potentials trained on gas-phase DFT.
Oct 7, 2025 · Corin Wagen