Predicting Solubility

Solubility describes the ability of one substance, the solute, to be dissolved in another, the solvent, and form a solution. Most solubility-related problems center around a solid-state solute and a liquid solvent.

Solubility is an important property, and tuning solubility is important for real-world outcomes including:

Conventional Approaches to Solubility Prediction

Solubility, at a high-level, is governed by the principle: "like dissolves like." In some cases, this simple rule is all you need. Polar solute? Try using a polar solvent. Non-polar solute? Try using a non-polar solvent.

However, not every case is so simple. More-complex traditional approaches to solubility prediction involve experimentally measuring one or a few parameters for each new compound under study to apply the "like dissolves like" principle in a more data-driven fashion. These approaches include the Hildebrand solubility parameter and its successor the Hansen solubility parameters. (You can read more about how solubility-prediction methods have changed over time on our blog.)

Machine Learning–Based Approaches to Solubility Prediction

The combination of machine-learning (ML) techniques, molecular descriptors, and chemical fingerprinting has made it possible to search through molecular space in a more parameter agnostic way to predict solubility completely in silico. After solubility data is gathered on a suitably large test set of compounds, solvents, and temperatures, ML models are able to learn the rules of solubility and predict the solubility of new compounds without relying on additional costly real-world measurements.

Rowan's Solubility Prediction Tools

Rowan has built a solubility prediction workflow focused on helping experimental chemists choose the right solvent and temperature for any given solute. Rowan's workflow allows users to select different solubility-prediction methods, making it easy to compare different models and achieve maximum accuracy for the task at hand.

fastsolv - Organic Solubility Prediction

To predict the solubility of unseen molecules Rowan uses fastsolv, a machine-learned model trained by Lucas Attia, Jackson Burns, and co-workers at MIT, to predict solubilities.

Solute structures can be input via PubChem or via SMILES; Rowan has a long list of pre-defined solvents and allows for custom solvent entry via SMILES as well.

When a solubility prediction job is submitted, Rowan will predict the solubility of each input solute across the range of input solvents and input temperatures. This typically takes under one minute to complete, thanks to our optimized backend infrastructure. Each solute will output a temperature-dependent solubility graph that communicates all the information needed to choose an appropriate solvent.

Solubility predictions can also be submitted using Rowan's free Python API. Here's a simple script to submit a solubility prediction for a SMILES string:

import rowan

rowan.api_key = "rowan-sk-your-key-here"

rowan.submit_solubility_workflow(
    initial_smiles="c1cccnc1CCOC",
    solubility_method="fastsolv",
    solvents=["CS(=O)C", "CO"],
    temperatures=[293.15],
    name=f"solubility prediction",
)

print(result.wait_for_result().fetch_latest(in_place=True))

For more details, refer to our API documentation.

ESOL and Kingfisher - Aqueous Solubility Prediction

For many use cases in drug discovery and agrochemistry, only solubility under physiological conditions (water at room temperature) matters. Rather than use the general-purpose fastsolv model for these cases, Rowan offers two specialized models designed to predict aqueous solubility with maximum accuracy: Kingfisher and ESOL. Kingfisher is a graph neural network, while ESOL is a multilinear cheminformatic model. These methods have been trained on a curated high-accuracy dataset and validated against a variety of external test sets: our full technical report can be found here.

Kingfisher and ESOL predictions don't have a temperature-dependent solubility graph, since only a single value is returned. However, these models can be used to produce pH-dependent solubility, which is often of interest in process chemistry and related fields. For more details, refer to our dedicated aqueous solubility page.

Banner background image