Preparing SMILES Strings For Downstream Applications

by Corin Wagen · Oct 3, 2025

The SMILES format is one of the most common ways to represent molecules in large chemical libraries. One challenge associated with storing molecules in SMILES format is that different tautomers and protonation states all have different SMILES representations, although these strings all represent the same fundamental chemical species. For modeling solution-phase properties like solubility, toxicity, or binding affinity, scientists typically prefer to use the SMILES that best represents the actual protonation state and tautomer of the compound in solution.

Unfortunately, identifying the precise protonation state of a molecule can be challenging to accomplish at scale. While SMILES strings can easily be converted into the "canonical" SMILES form in the RDKit, the canonical SMILES doesn't necessarily represent what will actually be in solution. For instance, the canonical SMILES for trimethylamine is CN(C)C, while the species which will predominate at pH 7.4 is the protonated microstate C[NH+](C)C. Correctly identifying this SMILES string in a black-box fashion requires determining the relative acidity and basicity of various sites, a complex and non-trivial challenge which cheminformatics packages like the RDKit don't attempt to solve.

Rowan's macroscopic pKa workflow provides a simple and robust way to automatically convert molecules to the protonation state and tautomer predicted to predominate at a given pH. Here's a simple Python script that uses Rowan's API to convert a SMILES string into the preferred protonation state:

import rowan

# Set ROWAN_API_KEY environment variable to your API key or set rowan.api_key directly
# rowan.api_key = "rowan-sk..."

def get_best_microstate(smiles: str, target_ph: float=7.4) -> str:
    """
    Converts a given input SMILES string to the most populated microstate at a given pH.

    :param smiles: the input SMILES string
    :param target_ph: the pH at which to assess microstate distribution
    :returns: the SMILES of the microstate
    """

    result = rowan.submit_macropka_workflow(
        initial_smiles=smiles,
        name="example macropka",
    )

    result.wait_for_result().fetch_latest(in_place=True)

    for ph, microstate_weights in result.data["microstate_weights_by_pH"]:
        if abs(target_ph - ph) < 0.01:
            ms = result.data["microstates"][microstate_weights.index(max(microstate_weights))]
            return ms["smiles"]


print(f"best microstate is {get_best_microstate('CN(C)C')}")

As expected, running the above Python script quickly returns:

best microstate is C[NH+](C)C

For small to medium-sized molecules, each macro-pKa calculation takes approximately 20 seconds (or 0.3 credits on Rowan). This is fast enough that this workflow can be run on thousands or tens of thousands of molecules, letting scientists quickly run these calculations before initializing a docking screen or training an ML model.

Rowan's macroscopic pKa workflow is available to all subscribers. If your work requires studying large numbers of molecules under physiological conditions, consider subscribing to Rowan or reaching out about a plan for your organization! We'd love to partner with you and support your science.

Banner background image

Subscribe to Rowan

To run this workflow and other advanced workflows on your own molecules, subscribe to Rowan today.

Start computing →

What to read next

Improving Rowan's Performance on the OpenBind EV-A71 Release

Improving Rowan's Performance on the OpenBind EV-A71 Release

How we recovered useful RBFE accuracy on a challenging real-world dataset.
May 20, 2026 · Corin Wagen
New Protein Visualizations

New Protein Visualizations

distilling insight from complexity; two-dimensional protein–ligand interaction diagrams; protein blob surfaces; space-filling molecule representations
May 19, 2026 · Ari Wagen
Notes on Rowan Engineering; Or How to Vibe-Refactor a Codebase

Notes on Rowan Engineering; Or How to Vibe-Refactor a Codebase

stuck in Rowan's dependency slough of despond; fleeing the complexity of microservices & partial refactors; multiplying packages to reduce complexity; using agents to vibe-refactor our whole codebase
May 13, 2026 · Jonathon Vandezande
Testing Rowan on the OpenBind EV-A71 Release

Testing Rowan on the OpenBind EV-A71 Release

How Rowan's analogue-docking and RBFE workflows fare on this dataset.
May 6, 2026 · Corin Wagen
Benchmarking Membrane-Permeability Predictors

Benchmarking Membrane-Permeability Predictors

Testing GNN-MTL and PyPermm on datasets of small molecules, macrocycles, and PROTACs
Apr 28, 2026 · Ari Wagen
Smarter Analogue Docking, Pocket Detection, and g-xTB Analytical Gradients

Smarter Analogue Docking, Pocket Detection, and g-xTB Analytical Gradients

more robust MCS detection; conformer sampling with torsional Monte Carlo; better alignment and RBFE results; a new pocket-detection workflow; analytical gradients now available for g-xTB
Apr 23, 2026 · Zachary Fried, Corin Wagen, Ari Wagen, and Jonathon Vandezande
g-xTB pKa and Website Redesign

g-xTB pKa and Website Redesign

the flaws with Rowan's AIMNet2-based pKa method; our new g-xTB-based approach; benchmarking and availability; a logo and new website for Rowan
Apr 15, 2026 · Corin Wagen and Ari Wagen
Easter Updates to Rowan

Easter Updates to Rowan

webhooks, draft workflows, and usage estimates for Rowan's Python API; tautomers in non-aqueous solvents; COSMO-based descriptors; overage-based billing; an FEP speed test; welcome Zach
Apr 9, 2026 · Eli Mann, Ari Wagen, Spencer Schneider, Jonathon Vandezande, and Corin Wagen
How Fast Can FEP Run?

How Fast Can FEP Run?

Pushing the speed limit for RBFE calculations run through TMD.
Apr 8, 2026 · Corin Wagen
Improving Rowan's API

Improving Rowan's API

API as a coequal interface to Rowan's product; what we're changing in v3.0.0 of rowan-python; typed outputs; new workflow API; more agent-friendly features; acknowledging our early partners here
Mar 19, 2026 · Eli Mann, Corin Wagen, Jonathon Vandezande, and Spencer Schneider