Lyrebird: Molecular Conformer Ensemble Generation

by Eli Mann · Nov 5, 2025

This work was conducted by Vedant Nilabh, a summer intern from Northeastern University. Thanks Vedant!

Lyrebird illustration from Brehms Tierleben.

Most molecules can exist in different 3D shapes, called conformers. Each conformer is a local minima on the potential-energy surface and has an associated energy, which determines its population at a given temperature. The observed macroscopic behavior of a molecule typically arises in part from all relevant conformations, making proper conformer search and ranking an important part of almost all chemical simulation problems.

Unfortunately, finding all the conformers of a given molecule is very difficult. There are a variety of commonly used methods, each with its strengths and limitations. At Rowan, we've generally relied on two methods to date: ETKDG, a stochastic distance-geometry-based approach incorporating experimental torsional heuristics, and CREST, an iterative metadynamics-based approach that also incorporates a genetic-structure-crossing algorithm to increase diversity. While we've had great success with both of these methods (like many other groups), both have their problems—ETKDG is somewhat inaccurate, particularly for large and flexible molecules, and can fail in particularly complex cases, while CREST is extremely slow and often struggles to explore enough space in a reasonable time. As such, we've been on the lookout for alternative conformer-generation methods.

Lyrebird is our first foray into machine-learning-based conformer-generation algorithms. The Lyrebird architecture is based on the ET-Flow equivariant flow-matching architecture from Hassan et al. (preprint, GitHub). The model learns a conditional vector field that transports samples from a harmonic prior, conditioned on a covalent-bond graph, to the true distribution of 3D molecular conformers. The flow model then integrates a deterministic ODE to continuously transform these prior samples into realistic conformations. Because the network is SE(3)-equivariant, the learned vector field respects rotational and translational symmetries of molecules.

The ET-Flow architecture.

Figure 1 from the ET-Flow paper.

The original ET-Flow model was trained on a split of the GEOM-DRUGS subset of the GEOM dataset from Bombarelli et al., which contains over 317,000 ensembles of mid-sized drug-like organic molecules . Their studies show that the models perform well for molecules sampled within their training distributions, but poorly for molecules outside of their distribution. For Lyrebird, we increased the in-distribution samples by training on three datasets: GEOM-DRUGS; GEOM-QM9, a dataset with 133,258 small organic molecules limited to 9 heavy atoms; and CREMP, a dataset with 36,198 unique macrocyclic peptides. We hypothesized that increasing the diversity of the training dataset might lead to increased model generalizability, as well as improving the robustness of the model for routine chemical modeling tasks.

To test this hypothesis, we tested Lyrebird on Butina splits of GEOM-QM9, GEOM-DRUGS, and CREMP, as well as several challenging external sets: MPCONF196GEN, a small dataset containing conformers ensembles of the structures from MPCONF196, and GEOM-XL, a set of flexible organic compounds with up to 91 heavy atoms.

We evaluated our models against a variety of ML methods, as well as ETKDGv3, with metrics that evaluate both the diversity and geometric accuracy of a generated conformer ensemble. (We didn't benchmark against CREST because CREST was used to generate the training-data ensembles.) The metrics used for comparing conformer ensembles are a bit complex, because comparing two ensembles is a bit tricky, and merit specific explanation:

MethodRecall Coverage ↑ (Mean)Recall Coverage ↑ (Median)Recall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision Coverage ↑ (Mean)Precision Coverage ↑ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
Torsional Diffusion86.91100.000.200.1682.64100.000.240.22
ET-Flow87.02100.000.210.1471.7587.500.330.28
RDKit ETKDG87.99100.000.230.1890.82100.000.220.18
Lyrebird92.99100.000.100.0386.99100.000.160.05

Table 1: GEOM-QM9 test set results (threshold δ = 0.5 Å). Coverage in %, AMR in Å. Best results in bold.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG4.694.684.734.71
ET-Flow4.134.07>6>6
Lyrebird2.342.332.822.81

Table 2: CREMP test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG2.922.623.353.15
Torsional Diffusion*2.051.862.942.78
ET-Flow2.311.933.312.84
Lyrebird2.422.073.272.87

Table 3: GEOM-XL test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage.
*Torsional Diffusion generated only 77/102 ensembles.

MethodRecall AMR ↓ (Mean)Recall AMR ↓ (Median)Precision AMR ↓ (Mean)Precision AMR ↓ (Median)
RDKit ETKDG3.793.714.013.91
Torsional Diffusion*2.712.583.132.95
ET-Flow2.603.332.833.59
Lyrebird2.542.962.803.56

Table 4: MPCONF196GEN test set results. Lower AMR is better (↓). Best results in bold. Coverage not reported because all methods have very low ensemble coverage. *Torsional Diffusion generated only 12/13 ensembles.

We found that Lyrebird outperforms ETKDG, in terms of both precision and recall, on every precision/recall metric we studied. Versus other ML methods like Torsional Diffusion and ET-Flow, the results are a bit more mixed—Lyrebird performs better when there's more relevant training data (e.g. Tables 1 and 2), but doesn't in general seem to generalize significantly better for "difficult" benchmark sets like GEOM-XL (Table 3) or MPCONF196GEN (Table 4). In general, all methods seem quite poor on these sets (an RMSD of 2.5 Å hardly inspires confidence).

We're excited to list the Lyrebird model on Rowan today for all users. While it's not a massive improvement over the previous ET-Flow method in areas similar to the core GEOM-DRUGS dataset, we anticipate that the increased diversity of the training data will make Lyrebird more robust and generalizable across the variety of scientific areas that our users study. As people use this model more, we look forward to seeing how well it performs on real-life use cases, particularly in comparison to existing methods like ETKDG and CREST. We note that Lyrebird is a newly released model, and that results should be carefully checked for production use cases before being relied upon—we don't expect that Lyrebird will be as reliable as ETKDG or CREST yet.

In parallel with this launch, we're releasing the Lyrebird weights on GitHub under an MIT license, making it easy for users to run Lyrebird locally or as a part of different workflows. We're also releasing our new MPCONF196GEN benchmark set under an MIT license for other groups to use when benchmarking conformer-generation methods.

Banner background image

What to Read Next

Batch Calculations Through Rowan's API

Batch Calculations Through Rowan's API

How to efficiently submit and analyze lots of workflows through Rowan's free Python API.
Dec 10, 2025 · Corin Wagen
Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Ari interviews Kat Yenko about her vision for BioArena, what led her to get started, and how to evaluate the utility of frontier models for real-world science.
Dec 9, 2025 · Ari Wagen
Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Corin talks with Daniil and Andrei about their recent seed round and how they plan to automate all of synthesis.
Dec 5, 2025 · Corin Wagen
Eliminating Imaginary Frequencies

Eliminating Imaginary Frequencies

How to get rid of pesky imaginary frequencies.
Dec 1, 2025 · Corin Wagen
Conformer Deduplication, Clustering, and Analytics

Conformer Deduplication, Clustering, and Analytics

deduplicating conformers with PRISM Pruner; Monte-Carlo-based conformer search; uploading conformer ensembles; clustering conformers to improve efficiency; better analytics on output ensembles
Nov 25, 2025 · Corin Wagen, Ari Wagen, and Jonathon Vandezande
The Multiple-Minimum Monte Carlo Method for Conformer Generation

The Multiple-Minimum Monte Carlo Method for Conformer Generation

Guest blog post from Nick Casetti discussing his new multiple-minimum Monte Carlo method for conformer generation.
Nov 24, 2025 · Nick Casetti
Screening Conformer Ensembles with PRISM Pruner

Screening Conformer Ensembles with PRISM Pruner

Guest blog post from Nicolò Tampellini, discussing efficient pruning of conformational ensembles using RMSD and moment of inertia metrics.
Nov 21, 2025 · Nicolò Tampellini
GPU-Accelerated DFT

GPU-Accelerated DFT

the power of modern GPU hardware; GPU4PySCF on Rowan; pricing changes coming in 2026; an interview with Navvye Anand from Bindwell; using Rowan to develop antibacterial PROTACs
Nov 19, 2025 · Jonathon Vandezande, Ari Wagen, Corin Wagen, and Spencer Schneider
Rowan Research Spotlight: Emilia Taylor

Rowan Research Spotlight: Emilia Taylor

Emilia's work on BacPROTACs and how virtual screening through Rowan can help.
Nov 19, 2025 · Corin Wagen
GPU-Accelerated DFT with GPU4PySCF

GPU-Accelerated DFT with GPU4PySCF

A brief history of GPU-accelerated DFT and a performance analysis of GPU4PySCF, Rowan's newest DFT engine.
Nov 19, 2025 · Jonathon Vandezande