Preparing SMILES Strings For Downstream Applications

by Corin Wagen · Oct 3, 2025

The SMILES format is one of the most common ways to represent molecules in large chemical libraries. One challenge associated with storing molecules in SMILES format is that different tautomers and protonation states all have different SMILES representations, although these strings all represent the same fundamental chemical species. For modeling solution-phase properties like solubility, toxicity, or binding affinity, scientists typically prefer to use the SMILES that best represents the actual protonation state and tautomer of the compound in solution.

Unfortunately, identifying the precise protonation state of a molecule can be challenging to accomplish at scale. While SMILES strings can easily be converted into the "canonical" SMILES form in the RDKit, the canonical SMILES doesn't necessarily represent what will actually be in solution. For instance, the canonical SMILES for trimethylamine is CN(C)C, while the species which will predominate at pH 7.4 is the protonated microstate C[NH+](C)C. Correctly identifying this SMILES string in a black-box fashion requires determining the relative acidity and basicity of various sites, a complex and non-trivial challenge which cheminformatics packages like the RDKit don't attempt to solve.

Rowan's macroscopic pKa workflow provides a simple and robust way to automatically convert molecules to the protonation state and tautomer predicted to predominate at a given pH. Here's a simple Python script that uses Rowan's API to convert a SMILES string into the preferred protonation state:

import rowan

# Set ROWAN_API_KEY environment variable to your API key or set rowan.api_key directly
# rowan.api_key = "rowan-sk..."

def get_best_microstate(smiles: str, target_ph: float=7.4) -> str:
    """
    Converts a given input SMILES string to the most populated microstate at a given pH.

    :param smiles: the input SMILES string
    :param target_ph: the pH at which to assess microstate distribution
    :returns: the SMILES of the microstate
    """

    result = rowan.submit_macropka_workflow(
        initial_smiles=smiles,
        name="example macropka",
    )

    result.wait_for_result().fetch_latest(in_place=True)

    for ph, microstate_weights in result.data["microstate_weights_by_pH"]:
        if abs(target_ph - ph) < 0.01:
            ms = result.data["microstates"][microstate_weights.index(max(microstate_weights))]
            return ms["smiles"]


print(f"best microstate is {get_best_microstate('CN(C)C')}")

As expected, running the above Python script quickly returns:

best microstate is C[NH+](C)C

For small to medium-sized molecules, each macro-pKa calculation takes approximately 20 seconds (or 0.3 credits on Rowan). This is fast enough that this workflow can be run on thousands or tens of thousands of molecules, letting scientists quickly run these calculations before initializing a docking screen or training an ML model.

Rowan's macroscopic pKa workflow is available to all subscribers. If your work requires studying large numbers of molecules under physiological conditions, consider subscribing to Rowan or reaching out about a plan for your organization! We'd love to partner with you and support your science.

Banner background image

What to Read Next

Improving Rowan's API

Improving Rowan's API

API as a coequal interface to Rowan's product; what we're changing in v3.0.0 of rowan-python; typed outputs; new workflow API; more agent-friendly features; acknowledging our early partners here
Mar 19, 2026 · Eli Mann, Corin Wagen, Jonathon Vandezande, and Spencer Schneider
Building Modern AI-Enabled Infrastructure for Pharma: A Conversation with Anthony Bradley from Dalton

Building Modern AI-Enabled Infrastructure for Pharma: A Conversation with Anthony Bradley from Dalton

Corin talks with Anthony about the real problems in computer-assisted drug discovery, how to sell software to pharma, and what Dalton can learn from Nike.
Mar 17, 2026 · Corin Wagen
Free-Energy Perturbation

Free-Energy Perturbation

what FEP is and why it's useful; limitations of current methods; Rowan FEP, TMD, and public benchmarks; how to run FEP in Rowan; the dream of FEP "too cheap to meter"; how to try Rowan FEP
Mar 4, 2026 · Corin Wagen, Eli Mann, Ari Wagen, and Spencer Schenider
Free-Energy Perturbation: A Pedagogical Introduction

Free-Energy Perturbation: A Pedagogical Introduction

Learn the core concepts behind free energy perturbation (FEP) using interactive 1D toy systems with exact analytical results.
Mar 4, 2026 · Corin Wagen
Solvent-Dependent Conformer Search

Solvent-Dependent Conformer Search

a good conformer is hard to find; clustering and the ReSCoSS workflow; Rowan's implementation, with some expert help; a demonstration on maraviroc
Feb 26, 2026 · Corin Wagen and Ari Wagen
How to Predict Protein–Ligand Binding Affinity

How to Predict Protein–Ligand Binding Affinity

A comparison of seven different approaches to predicting binding affinity.
Feb 13, 2026 · Corin Wagen
SAPT, Protein Preparation, and Starling-Based Microscopic pKa

SAPT, Protein Preparation, and Starling-Based Microscopic pKa

interaction energy decomposition w/ SAPT0 & a warning; making protein preparation more granular; catching forcefield errors earlier; microscopic pKa via Starling; internship applications now open
Feb 12, 2026 · Corin Wagen, Jonathon Vandezande, Ari Wagen, and Eli Mann
Credits FAQ

Credits FAQ

How credits work, why Rowan tracks usage with credits, and how these numbers translate into real-world workflows.
Feb 9, 2026 · Corin Wagen and Ari Wagen
Analogue Docking, Protein MD, Multiple Co-Folding Samples, Speed Estimates, and 2FA

Analogue Docking, Protein MD, Multiple Co-Folding Samples, Speed Estimates, and 2FA

docking analogues to a template; running MD on proteins w/o ligands; generating multiple structures with Boltz & Chai; runtime estimates & dispatch information; two-factor authentication; speedups
Jan 28, 2026 · Corin Wagen, Ari Wagen, and Spencer Schneider
Predicting Permeability for Small Molecules

Predicting Permeability for Small Molecules

why permeability matters; different experimental and computational approaches; Rowan's supported methods; an example script
Jan 9, 2026 · Corin Wagen, Eli Mann, and Ari Wagen