Preparing SMILES Strings For Downstream Applications

by Corin Wagen · Oct 3, 2025

The SMILES format is one of the most common ways to represent molecules in large chemical libraries. One challenge associated with storing molecules in SMILES format is that different tautomers and protonation states all have different SMILES representations, although these strings all represent the same fundamental chemical species. For modeling solution-phase properties like solubility, toxicity, or binding affinity, scientists typically prefer to use the SMILES that best represents the actual protonation state and tautomer of the compound in solution.

Unfortunately, identifying the precise protonation state of a molecule can be challenging to accomplish at scale. While SMILES strings can easily be converted into the "canonical" SMILES form in the RDKit, the canonical SMILES doesn't necessarily represent what will actually be in solution. For instance, the canonical SMILES for trimethylamine is CN(C)C, while the species which will predominate at pH 7.4 is the protonated microstate C[NH+](C)C. Correctly identifying this SMILES string in a black-box fashion requires determining the relative acidity and basicity of various sites, a complex and non-trivial challenge which cheminformatics packages like the RDKit don't attempt to solve.

Rowan's macroscopic pKa workflow provides a simple and robust way to automatically convert molecules to the protonation state and tautomer predicted to predominate at a given pH. Here's a simple Python script that uses Rowan's API to convert a SMILES string into the preferred protonation state:

import rowan

# Set ROWAN_API_KEY environment variable to your API key or set rowan.api_key directly
# rowan.api_key = "rowan-sk..."

def get_best_microstate(smiles: str, target_ph: float=7.4) -> str:
    """
    Converts a given input SMILES string to the most populated microstate at a given pH.

    :param smiles: the input SMILES string
    :param target_ph: the pH at which to assess microstate distribution
    :returns: the SMILES of the microstate
    """

    result = rowan.submit_macropka_workflow(
        initial_smiles=smiles,
        name="example macropka",
    )

    result.wait_for_result().fetch_latest(in_place=True)

    for ph, microstate_weights in result.data["microstate_weights_by_pH"]:
        if abs(target_ph - ph) < 0.01:
            ms = result.data["microstates"][microstate_weights.index(max(microstate_weights))]
            return ms["smiles"]


print(f"best microstate is {get_best_microstate('CN(C)C')}")

As expected, running the above Python script quickly returns:

best microstate is C[NH+](C)C

For small to medium-sized molecules, each macro-pKa calculation takes approximately 20 seconds (or 0.3 credits on Rowan). This is fast enough that this workflow can be run on thousands or tens of thousands of molecules, letting scientists quickly run these calculations before initializing a docking screen or training an ML model.

Rowan's macroscopic pKa workflow is available to all subscribers. If your work requires studying large numbers of molecules under physiological conditions, consider subscribing to Rowan or reaching out about a plan for your organization! We'd love to partner with you and support your science.

Banner background image

What to Read Next

Solvent-Dependent Conformer Search

Solvent-Dependent Conformer Search

a good conformer is hard to find; clustering and the ReSCoSS workflow; Rowan's implementation, with some expert help; a demonstration on maraviroc
Feb 26, 2026 · Corin Wagen and Ari Wagen
How to Predict Protein–Ligand Binding Affinity

How to Predict Protein–Ligand Binding Affinity

A comparison of seven different approaches to predicting binding affinity.
Feb 13, 2026 · Corin Wagen
SAPT, Protein Preparation, and Starling-Based Microscopic pKa

SAPT, Protein Preparation, and Starling-Based Microscopic pKa

interaction energy decomposition w/ SAPT0 & a warning; making protein preparation more granular; catching forcefield errors earlier; microscopic pKa via Starling; internship applications now open
Feb 12, 2026 · Corin Wagen, Jonathon Vandezande, Ari Wagen, and Eli Mann
Credits FAQ

Credits FAQ

How credits work, why Rowan tracks usage with credits, and how these numbers translate into real-world workflows.
Feb 9, 2026 · Corin Wagen and Ari Wagen
Analogue Docking, Protein MD, Multiple Co-Folding Samples, Speed Estimates, and 2FA

Analogue Docking, Protein MD, Multiple Co-Folding Samples, Speed Estimates, and 2FA

docking analogues to a template; running MD on proteins w/o ligands; generating multiple structures with Boltz & Chai; runtime estimates & dispatch information; two-factor authentication; speedups
Jan 28, 2026 · Corin Wagen, Ari Wagen, and Spencer Schneider
Predicting Permeability for Small Molecules

Predicting Permeability for Small Molecules

why permeability matters; different experimental and computational approaches; Rowan’s supported methods; an example script
Jan 9, 2026 · Corin Wagen, Eli Mann, and Ari Wagen
2025 in Review

2025 in Review

looking back on the last year for Rowan
Jan 1, 2026 · Corin Wagen
Making Rowan Even Easier To Use

Making Rowan Even Easier To Use

easier sign-on; layered security with IP whitelists; clearer costs; solvent-aware conformer searching; interviews with onepot and bioArena
Dec 16, 2025 · Ari Wagen, Spencer Schneider, and Corin Wagen
Batch Calculations Through Rowan's API

Batch Calculations Through Rowan's API

How to efficiently submit and analyze lots of workflows through Rowan's free Python API.
Dec 10, 2025 · Corin Wagen
Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Building BioArena: Kat Yenko on Evaluating Scientific AI Agents

Ari interviews Kat Yenko about her vision for BioArena, what led her to get started, and how to evaluate the utility of frontier models for real-world science.
Dec 9, 2025 · Ari Wagen