Lyrebird: Molecular Conformer Ensemble Generation

by Eli Mann · Nov 5, 2025

This work was conducted by Vedant Nilabh, a summer intern from Northeastern University. Thanks Vedant!

Lyrebird illustration from Brehms Tierleben.

Most molecules can exist in different 3D shapes, called conformers. Each conformer is a local minima on the potential-energy surface and has an associated energy, which determines its population at a given temperature. The observed macroscopic behavior of a molecule typically arises in part from all relevant conformations, making proper conformer search and ranking an important part of almost all chemical simulation problems.

Unfortunately, finding all the conformers of a given molecule is very difficult. There are a variety of commonly used methods, each with its strengths and limitations. At Rowan, we've generally relied on two methods to date: ETKDG, a stochastic distance-geometry-based approach incorporating experimental torsional heuristics, and CREST, an iterative metadynamics-based approach that also incorporates a genetic-structure-crossing algorithm to increase diversity. While we've had great success with both of these methods (like many other groups), both have their problems—ETKDG is somewhat inaccurate, particularly for large and flexible molecules, and can fail in particularly complex cases, while CREST is extremely slow and often struggles to explore enough space in a reasonable time. As such, we've been on the lookout for alternative conformer-generation methods.

Lyrebird is our first foray into machine-learning-based conformer-generation algorithms. The Lyrebird architecture is based on the ET-Flow equivariant flow-matching architecture from Hassan et al. (preprint, GitHub). The model learns a conditional vector field that transports samples from a harmonic prior, conditioned on a covalent-bond graph, to the true distribution of 3D molecular conformers. The flow model then integrates a deterministic ODE to continuously transform these prior samples into realistic conformations. Because the network is SE(3)-equivariant, the learned vector field respects rotational and translational symmetries of molecules.

The ET-Flow architecture.

Figure 1 from the ET-Flow paper.

The original ET-Flow model was trained on a split of the GEOM-DRUGS subset of the GEOM dataset from Bombarelli et al., which contains over 317,000 ensembles of mid-sized drug-like organic molecules . Their studies show that the models perform well for molecules sampled within their training distributions, but poorly for molecules outside of their distribution. For Lyrebird, we increased the in-distribution samples by training on three datasets: GEOM-DRUGS; GEOM-QM9, a dataset with 133,258 small organic molecules limited to 9 heavy atoms; and CREMP, a dataset with 36,198 unique macrocyclic peptides. We hypothesized that increasing the diversity of the training dataset might lead to increased model generalizability, as well as improving the robustness of the model for routine chemical modeling tasks.

To test this hypothesis, we tested Lyrebird on Butina splits of GEOM-QM9, GEOM-DRUGS, and CREMP, as well as several challenging external sets: MPCONF196GEN, a small dataset containing conformers ensembles of the structures from MPCONF196, and GEOM-XL, a set of flexible organic compounds with up to 91 heavy atoms.

We evaluated our models against a variety of ML methods, as well as ETKDGv3, with metrics that evaluate both the diversity and geometric accuracy of a generated conformer ensemble. (We didn't benchmark against CREST because CREST was used to generate the training-data ensembles.) The metrics used for comparing conformer ensembles are a bit complex, because comparing two ensembles is a bit tricky, and merit specific explanation:

MethodRecall Coverage ↑ (Mean)Recall Coverage ↑ (Median)Recall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision Coverage ↑ (Mean)Precision Coverage ↑ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
Torsional Diffusion86.91100.000.200.1682.64100.000.240.22
ET-Flow87.02100.000.210.1471.7587.500.330.28
RDKit ETKDG87.99100.000.230.1890.82100.000.220.18
Lyrebird92.99100.000.100.0386.99100.000.160.05

Table 1: GEOM-QM9 test set results (threshold δ = 0.5 Å). Coverage in %, AMR in Å. Best results in bold.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG4.694.684.734.71
ET-Flow4.134.07>6>6
Lyrebird2.342.332.822.81

Table 2: CREMP test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG2.922.623.353.15
Torsional Diffusion*2.051.862.942.78
ET-Flow2.311.933.312.84
Lyrebird2.422.073.272.87

Table 3: GEOM-XL test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.
*Torsional Diffusion generated only 77/102 ensembles.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG3.793.714.013.91
Torsional Diffusion*2.712.583.132.95
ET-Flow2.603.332.833.59
Lyrebird2.542.962.803.56

Table 4: MPCONF196GEN test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage. *Torsional Diffusion generated only 12/13 ensembles.

We found that Lyrebird outperforms ETKDG, in terms of both precision and recall, on every precision/recall metric we studied. Versus other ML methods like Torsional Diffusion and ET-Flow, the results are a bit more mixed—Lyrebird performs better when there's more relevant training data (e.g. Tables 1 and 2), but doesn't in general seem to generalize significantly better for "difficult" benchmark sets like GEOM-XL (Table 3) or MPCONF196GEN (Table 4). In general, all methods seem quite poor on these sets (an RMSD of 2.5 Å hardly inspires confidence).

We're excited to list the Lyrebird model on Rowan today for all users. While it's not a massive improvement over the previous ET-Flow method in areas similar to the core GEOM-DRUGS dataset, we anticipate that the increased diversity of the training data will make Lyrebird more robust and generalizable across the variety of scientific areas that our users study. As people use this model more, we look forward to seeing how well it performs on real-life use cases, particularly in comparison to existing methods like ETKDG and CREST. We note that Lyrebird is a newly released model, and that results should be carefully checked for production use cases before being relied upon—we don't expect that Lyrebird will be as reliable as ETKDG or CREST yet.

In parallel with this launch, we're releasing the Lyrebird weights on GitHub under an MIT license, making it easy for users to run Lyrebird locally or as a part of different workflows. We're also releasing our new MPCONF196GEN benchmark set under an MIT license for other groups to use when benchmarking conformer-generation methods.

Banner background image

What to Read Next

Conformer Deduplication, Clustering, and Analytics

Conformer Deduplication, Clustering, and Analytics

deduplicating conformers with PRISM Pruner; Monte-Carlo-based conformer search; uploading conformer ensembles; clustering conformers to improve efficiency; better analytics on output ensembles
Nov 25, 2025 · Corin Wagen, Ari Wagen, and Jonathon Vandezande
The Multiple-Minimum Monte Carlo Method for Conformer Generation

The Multiple-Minimum Monte Carlo Method for Conformer Generation

Guest blog post from Nick Casetti discussing his new multiple-minimum Monte Carlo method for conformer generation.
Nov 24, 2025 · Nick Casetti
Screening Conformer Ensembles with PRISM Pruner

Screening Conformer Ensembles with PRISM Pruner

Guest blog post from Nicolò Tampellini, discussing efficient pruning of conformational ensembles using RMSD and moment of inertia metrics.
Nov 21, 2025 · Nicolò Tampellini
GPU-Accelerated DFT

GPU-Accelerated DFT

the power of modern GPU hardware; GPU4PySCF on Rowan; pricing changes coming in 2026; an interview with Navvye Anand from Bindwell; using Rowan to develop antibacterial PROTACs
Nov 19, 2025 · Jonathon Vandezande, Ari Wagen, Corin Wagen, and Spencer Schneider
Rowan Research Spotlight: Emilia Taylor

Rowan Research Spotlight: Emilia Taylor

Emilia's work on BacPROTACs and how virtual screening through Rowan can help.
Nov 19, 2025 · Corin Wagen
GPU-Accelerated DFT with GPU4PySCF

GPU-Accelerated DFT with GPU4PySCF

A brief history of GPU-accelerated DFT and a performance analysis of GPU4PySCF, Rowan's newest DFT engine.
Nov 19, 2025 · Jonathon Vandezande
A Conversation With Navvye Anand (Bindwell)

A Conversation With Navvye Anand (Bindwell)

Corin interviews Navvye about pesticide discovery, the advantages that ML gives them, and what areas of research he's most excited about.
Nov 18, 2025 · Corin Wagen
Ion Mobility, Batch Docking, Strain, Flow-Matching Conformer Generation, and MSA

Ion Mobility, Batch Docking, Strain, Flow-Matching Conformer Generation, and MSA

a diverse litany of new features: ion-mobility mass spectrometry; high-throughput docking with QVina; a standalone strain workflow; Lyrebird, a new conformer-generation model; and standalone MSAs
Nov 5, 2025 · Corin Wagen, Ari Wagen, Eli Mann, and Spencer Schneider
Using Securely Generated MSAs to Run Boltz-2 and Chai-1

Using Securely Generated MSAs to Run Boltz-2 and Chai-1

Example scripts showing how Boltz-2 and Chai-1 can be run using MSA data from Rowan's MSA workflow.
Nov 5, 2025 · Spencer Schneider and Ari Wagen
Lyrebird: Molecular Conformer Ensemble Generation

Lyrebird: Molecular Conformer Ensemble Generation

Rowan's new flow-matching conformer-generation model, with benchmarks.
Nov 5, 2025 · Eli Mann