Preparing SMILES Strings For Downstream Applications

by Corin Wagen · Oct 3, 2025

The SMILES format is one of the most common ways to represent molecules in large chemical libraries. One challenge associated with storing molecules in SMILES format is that different tautomers and protonation states all have different SMILES representations, although these strings all represent the same fundamental chemical species. For modeling solution-phase properties like solubility, toxicity, or binding affinity, scientists typically prefer to use the SMILES that best represents the actual protonation state and tautomer of the compound in solution.

Unfortunately, identifying the precise protonation state of a molecule can be challenging to accomplish at scale. While SMILES strings can easily be converted into the "canonical" SMILES form in the RDKit, the canonical SMILES doesn't necessarily represent what will actually be in solution. For instance, the canonical SMILES for trimethylamine is CN(C)C, while the species which will predominate at pH 7.4 is the protonated microstate C[NH+](C)C. Correctly identifying this SMILES string in a black-box fashion requires determining the relative acidity and basicity of various sites, a complex and non-trivial challenge which cheminformatics packages like the RDKit don't attempt to solve.

Rowan's macroscopic pKa workflow provides a simple and robust way to automatically convert molecules to the protonation state and tautomer predicted to predominate at a given pH. Here's a simple Python script that uses Rowan's API to convert a SMILES string into the preferred protonation state:

import rowan

# Set ROWAN_API_KEY environment variable to your API key or set rowan.api_key directly
# rowan.api_key = "rowan-sk..."

def get_best_microstate(smiles: str, target_ph: float=7.4) -> str:
    """
    Converts a given input SMILES string to the most populated microstate at a given pH.

    :param smiles: the input SMILES string
    :param target_ph: the pH at which to assess microstate distribution
    :returns: the SMILES of the microstate
    """

    result = rowan.submit_macropka_workflow(
        initial_smiles=smiles,
        name="example macropka",
    )

    result.wait_for_result().fetch_latest(in_place=True)

    for ph, microstate_weights in result.data["microstate_weights_by_pH"]:
        if abs(target_ph - ph) < 0.01:
            ms = result.data["microstates"][microstate_weights.index(max(microstate_weights))]
            return ms["smiles"]


print(f"best microstate is {get_best_microstate('CN(C)C')}")

As expected, running the above Python script quickly returns:

best microstate is C[NH+](C)C

For small to medium-sized molecules, each macro-pKa calculation takes approximately 20 seconds (or 0.3 credits on Rowan). This is fast enough that this workflow can be run on thousands or tens of thousands of molecules, letting scientists quickly run these calculations before initializing a docking screen or training an ML model.

Rowan's macroscopic pKa workflow is available to all subscribers. If your work requires studying large numbers of molecules under physiological conditions, consider subscribing to Rowan or reaching out about a plan for your organization! We'd love to partner with you and support your science.

Banner background image

What to Read Next

Preparing SMILES for Downstream Applications

Preparing SMILES for Downstream Applications

How to quickly use Rowan to predict the correct protomer and tautomer for a given SMILES.
Oct 3, 2025 · Corin Wagen
Better Search and Filtering

Better Search and Filtering

the problem of too many calculations; new ways to search, filter, and sort; how to access these tools; future directions
Sep 30, 2025 · Ari Wagen and Spencer Schneider
Boltz-2 Constraints, Implicit Solvent for NNPs, and More

Boltz-2 Constraints, Implicit Solvent for NNPs, and More

new terms of service; comparing IRCs and conformer searches; contact and pocket constraints for Boltz-2; MOL2 download; implicit-solvent NNPs; draft workflows; optimizing docking efficiency
Sep 22, 2025 · Corin Wagen, Ari Wagen, Jonathon Vandezande, Eli Mann, and Spencer Schneider
Controlling the Speed of Rowan's Docking

Controlling the Speed of Rowan's Docking

Some notes on how docking can be tuned for different applications.
Sep 22, 2025 · Corin Wagen
Studying Scaling in Electron-Affinity Predictions

Studying Scaling in Electron-Affinity Predictions

Testing low-cost computational methods to see if they get the expected scaling effects right.
Sep 10, 2025 · Corin Wagen
Open-Source Projects We Wish Existed

Open-Source Projects We Wish Existed

The lacunæ we've identified in computational chemistry and suggestions for future work.
Sep 9, 2025 · Corin Wagen, Jonathon Vandezande, Ari Wagen, and Eli Mann
How to Make a Great Open-Source Scientific Project

How to Make a Great Open-Source Scientific Project

Guidelines for building great open-source scientific-software projects.
Sep 9, 2025 · Jonathon Vandezande
ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

the promise & peril of solubility prediction; our approach and models; pH-dependent solubility; testing NNPs for redox potentials; benchmarking opt. methods + NNPs; an FSM case study; intern farewell
Sep 5, 2025 · Eli Mann, Corin Wagen, and Ari Wagen
Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design.
Sep 5, 2025 · Elias L. Mann, Corin C. Wagen
What Isaiah and Sawyer Learned This Summer

What Isaiah and Sawyer Learned This Summer

Reflections from our other two interns on their time at Rowan and what they learned.
Sep 5, 2025 · Isaiah Sippel and Sawyer VanZanten