Screening Conformer Ensembles with PRISM Pruner

by Nicolò Tampellini · Nov 25, 2025

This is a guest post by Nicolò Tampellini, the author of the PRISM Pruner conformer screening package we use here at Rowan. Nicolò is currently a Ph.D. student in Scott Miller's lab at Yale University, and has worked on the computational modeling of multiple conformationally complex reactions in the context of asymmetric catalysis.

Many properties in computational chemistry are obtained from conformational ensembles: sets of many spatial arrangements of the same molecule (or aggregate) that are processed as a whole to accurately model the desired property. Working with ensembles is essential when targeting the lowest energy conformations (which often influence reactivity), calculating conformational entropies, or modeling any observable property that is modulated by conformations, e.g. the shielding tensors from which to obtain NMR chemical shifts.

In many instances, the generation of such ensembles and their refinement occur in separate steps and with different levels of theory. For example, a conformational search might be carried out with an inexpensive force field or semiempirical method, but further refinement of the ensemble needs to be carried out with high-level DFT to achieve chemical accuracy. In these multi-level workflows, it often happens that multiple geometries converge to the same local minima, and a pruning step is necessary to remove duplicates and make sure to only carry forward the minimal number of structures to keep the computational cost as low as possible.

The most well-known and used metric to compare conformations is the root-mean-squared deviation (RMSD) of atomic positions. This strategy often works well, but there are some tricky caveats: different rotamers of the same structure will have artificially high RMSD values, while being chemically identical! A more ingenious solution is to compare the moments of inertia along the principal axes. This strategy is indexing-invariant, and therefore should circumvent the degenerate rotamers issue, on top of also being faster to compute.

The popular conformational search engine CREST features an ensemble sorting routine which implements both of these metrics called CREGEN. While possible to use as a standalone program, it focuses on the removal of duplicate structures while retaining all rotamers. This is necessary for some tasks like the calculation of conformational entropy, but can enormously inflate the size of ensembles if you are not interested in them. Imagine modeling some organocatalyst with a dozen tert-butyl groups! CREGEN is also written in Fortran, which can lead to difficulties when integrating into existing Python pipelines.

Born out of necessity after working with large conformational ensembles, years ago I started writing a conformational pruning implementation in Python (initially as part of FIRECODE, a modular ensemble optimization driver). Rowan indicated a need for an open-source, standalone conformer screening tool in their recent Open-Source Projects We Wish Existed blog post, and I volunteered to convert my existing code into a standalone package. Working with Jonathon from Rowan, I extracted and polished the code into PRISM Pruner.

The code implements a cached, iterative, divide-and-conquer approach on increasingly larger subsets of the ensemble and removes duplicates as assessed by the two metrics above, RMSD and moment of inertia on the principal axes. On top of that, a third mode uses a rotamer-corrected RMSD metric, in cases where the moment of inertia alone is not sufficient to weed out redundant conformations. Comparing every structure to every other requires a lot of costly evaluations, and has O(N2) scaling (where N is the number of structures). If there are a lot of similar conformers, using a divide-and-conquer strategy to group them into smaller chunks can drastically reduce the number of calls, as the small chunks keep the number of evaluations under control by using small N values as N decreases.

If energies are available, we sort the ensemble before dividing it into chunks to have the best chance of grouping similar structures together early. After all chunks are evaluated, the leftover structures are used to repeat the process with larger chunks, until all active structures are included in the final evaluation. In many instances this results in significantly fewer comparisons, and a faster and more scalable algorithm. Even in the worst case of no similar structures, the use of a cache ensures that we don't ever perform more calls than a simple all-to-all algorithm would.

Our initial comparisons against CREGEN are very positive, particularly for larger ensembles with many identical rotamers, where the divide-and-conquer approach really shines. The worst-case scaling factor is still O(N2) if all conformers are different, but functionally it is much lower for most conformational ensembles.

Conformer pruning via MOIConformer pruning via RMSD

We have also added a convenience function to perform sequential pruning, using reasonable default values for each step: starting with the fast moment of inertia mode, it follows with RMSD-based pruning and then an optional, final, rotamer-corrected RMSD pruning. Processing ensembles with ≈1,000 structures of ≈150 atoms using these settings takes seconds, and removes many rotamers from ensembles obtained from CREST. Here are two examples from my Ph.D. work, showing how much a conformational ensemble can be inflated by undesired rotamers—the second one is really pathological! The DFT time saved by processing these ensembles before the next step is significant.

Conformer ensemble pruning

The future of this project is also in your hands: if you are interested in contributing with new features, feel free to reach out to me or open a request on GitHub! For example, more similarity evaluation metrics could be implemented to screen for specific conformational attributes.

The performance of some sections could also be improved, if needed: while the MOI-based evaluation of similarity is really fast, the RMSD evaluation with numpy alone could be faster. The original FIRECODE implementation of the RMSD metric relies on Numba, which compiles low-level code at runtime and achieves a ≈7x speedup on the numerically-intensive RMSD calculation. While really performant, the Numba library is very heavy, and can complicate integration into packages already containing a large number of dependencies, thus we decided not to include it. If you see further room for improvement in the code performance, we'd love to hear from you!

Banner background image

What to Read Next

Batch Calculations Through Rowan's API

Batch Calculations Through Rowan's API

How to efficiently submit and analyze lots of workflows through Rowan's free Python API.
Dec 10, 2025 · Corin Wagen
Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Ari interviews Kat Yenko about her vision for BioArena, what led her to get started, and how to evaluate the utility of frontier models for real-world science.
Dec 9, 2025 · Ari Wagen
Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Automating Organic Synthesis: A Conversation With Daniil Boiko and Andrei Tyrin from onepot

Corin talks with Daniil and Andrei about their recent seed round and how they plan to automate all of synthesis.
Dec 5, 2025 · Corin Wagen
Eliminating Imaginary Frequencies

Eliminating Imaginary Frequencies

How to get rid of pesky imaginary frequencies.
Dec 1, 2025 · Corin Wagen
Conformer Deduplication, Clustering, and Analytics

Conformer Deduplication, Clustering, and Analytics

deduplicating conformers with PRISM Pruner; Monte-Carlo-based conformer search; uploading conformer ensembles; clustering conformers to improve efficiency; better analytics on output ensembles
Nov 25, 2025 · Corin Wagen, Ari Wagen, and Jonathon Vandezande
The Multiple-Minimum Monte Carlo Method for Conformer Generation

The Multiple-Minimum Monte Carlo Method for Conformer Generation

Guest blog post from Nick Casetti discussing his new multiple-minimum Monte Carlo method for conformer generation.
Nov 24, 2025 · Nick Casetti
Screening Conformer Ensembles with PRISM Pruner

Screening Conformer Ensembles with PRISM Pruner

Guest blog post from Nicolò Tampellini, discussing efficient pruning of conformational ensembles using RMSD and moment of inertia metrics.
Nov 21, 2025 · Nicolò Tampellini
GPU-Accelerated DFT

GPU-Accelerated DFT

the power of modern GPU hardware; GPU4PySCF on Rowan; pricing changes coming in 2026; an interview with Navvye Anand from Bindwell; using Rowan to develop antibacterial PROTACs
Nov 19, 2025 · Jonathon Vandezande, Ari Wagen, Corin Wagen, and Spencer Schneider
Rowan Research Spotlight: Emilia Taylor

Rowan Research Spotlight: Emilia Taylor

Emilia's work on BacPROTACs and how virtual screening through Rowan can help.
Nov 19, 2025 · Corin Wagen
GPU-Accelerated DFT with GPU4PySCF

GPU-Accelerated DFT with GPU4PySCF

A brief history of GPU-accelerated DFT and a performance analysis of GPU4PySCF, Rowan's newest DFT engine.
Nov 19, 2025 · Jonathon Vandezande