The Evolution of Solubility Prediction Methods

by Jonathon Vandezande · Feb 25, 2025

Solubility is one of the first concepts introduced in chemistry classes, yet its apparent simplicity belies its profound importance. Solubility governs how solutes interact with solvents; a principle that is critical for a long list of things, including:

There are several methods to tackle this challenge. These include traditional solubility parameter theories and, more recently, data-driven machine learning (ML) approaches. Traditional models, such as Hansen and Hildebrand solubility parameters, derive a small number of empirical parameters to determine the similarity of solute and solvent. Data-driven machine-learning methods have gained traction more recently, offering new ways to capture complex solute-solvent interactions.

Traditional Solubility Methods

Traditional solubility methods for solubility prediction work by measuring parameters for the solute and solvent. Following the common adage of "like dissolves like," molecules/polymers with similar values for the solubility parameters are likely to be soluble, while those with differing values are insoluble.

Hildebrand Solubility Parameter

Hildebrand solubility prediction uses a single parameter model (δ) wherein molecules with similar values of δ will likely be miscible. δ is derived from the energy needed to vaporize the molecule (cohesive energy density), and thus can easily be derived for many molecules. It is calculated as:

δ=ΔHvRTVmδ = \sqrt{\frac{ΔH\text{v}−RT}{V_\text{m}}}

Hildebrand solubility parameters for common molecules.

Figure 1: Hildebrand solubility parameters for common molecules.

Hildebrand solubility prediction can be useful for non-polar and slightly-polar molecules and polymers. However, it cannot account for deviations from Raoult's law due hydrogen-bonding or dipolar interactions, such as in ethanol or acetone, as solubility inherently cannot be described by a single number.

Hansen Solubility Parameters (HSP)

Hansen solubility parameters (HSP) attempt to correct the single parameter Hildebrand model by partitioning the solubility into dispersion (δd\delta_\text{d}), dipolar interaction (δp\delta_\text{p}), and hydrogen bonding (δh\delta_\text{h}) components. Each solute also has a solubility radius (R0R_0), where solutes with a larger R0R_0 are soluble in a greater range of solvents. Each of these parameters is carefully measured and reported in MPa\sqrt{\text{MPa}}. (If you think this unit is confusing, you're not alone! Why is hydrogen bonding being reported as the square root of pressure?)

Each molecule is assigned its own set of parameters (δd\delta_\text{d}, δp\delta_\text{p}, δh\delta_\text{h}), and a "Hansen sphere" of radius R0R_0 is plotted around the point (the sphere is scaled down by a factor of 2 in δd\delta_\text{d}). Solvents inside this sphere are likely to dissolve the molecule, and solvents outside of it are likely unable to dissolve it. Parameters for molecules/polymers that have yet to be measured can be estimated by a series of solubility experiments to triangulate the values.

Hansen sphere showing the extent of solvents that can dissolve a molecule.

Figure 2: Hansen sphere showing the extent of solvents that can dissolve a molecule. The "sphere" is scaled by a factor of 2 in the dimension of δd\delta_\text{d}, as differences in dispersion have a greater effect on solubility.

While only a limited number of solvents may solvate a given molecule, HSP can predict mixtures of miscible solvents that can together dissolve the molecule. The HSP of a mixture is just the mean of the parameters, weighted by the volume fraction of each component.

The optimal mixture of solvents can be found by drawing a line connecting the solvents and finding the nearest point to the solute.

Figure 3: A mixture of solvents can solvate a molecule that is not miscible in either of the solvents individually. The optimal mixture can be found by drawing a line connecting the solvents and finding the nearest point to the solute. (Note: third dimension is removed for clarity.)

Hansen solubility parameters are particularly popular in polymer chemistry, where numerous measurements have been made of common solvents and polymers. They are often used to predict:

Extensions of Hansen solubility parameters have been made that include additional parameters and explicit temperature dependence, including the MOSCED 6-parameter model. However, these models require significantly more individual measurements. Attempts have been made to derive these parameters via computational modeling via MD, but it doing so can be expensive and it has not achieved widespread adoption.

Additionally, like most solubility models, HSP struggles with very small molecules that have strong hydrogen bonds, such as water and methanol. Water has a very strong hydrogen‐bonding parameter (δh\delta_\text{h} around 42 MPa\sqrt{\text{MPa}}), causing it to be very far from most organic molecules. However, it is actually an excellent solvent for many substances due to its ability to both donate and receive hydrogen bonds. Meanwhile, methanol's tendency to self-associate effectively hides some of its hydrogen‐bonding character and alters its measured δp\delta_\text{p} and δh\delta_\text{h} values. Modified values are often used to account for this behavior ((δd\delta_\text{d}, δp\delta_\text{p}, δh\delta_\text{h}) = (14.7, 5, 10) instead of the standard (14.5, 12.3, 22.3), but the accumulation of corrections can move the models away from its theoretical roots.

Machine-Learned Methods

The large number of corrections that are needed to precisely fit traditional solvation models can become rather unwieldy and makes addition of each new solvent time-consuming. Machine learning models forgo the exact semi-physical parameters and instead fit a model to a large amount of data. While these models often lose the explainability provided by traditional methods like HSP, they allow more accurate prediction of the actual solubility (as opposed to just the categorical soluble vs insoluble), straightforward prediction of temperature effects, and simple extension to previously unparameterized molecules.

Most ML methods start by engineering features from the target molecules. This can include fingerprinting (converting the functional groups of a molecule into a simple vector), explicitly calculating properties of a molecule (e.g. pKa, conformational flexibility, and aromaticity), or using the electron density (e.g. COSMOtherm/COSMO-RS). These features are then input into the model, which outputs a prediction for the solubility. Models can also be trained to output an uncertainty estimation for their results.

Thanks to feature engineering, it is possible to predict solubilities for previously unseen solutes and solvents, as long as molecules with similar properties were used in training the model. For example, if the model were trained on data containing n-pentane, n-hexane, and 1-aminopentane, it will likely perform well for 1-aminohexane, despite never having seen it.

Fastsolv

The fastsolv model from Lucas Attia, Jackson Burns, et al. is a deep-learning model that predicts solubility across a wide range of temperatures and a variety of organic solvents. It uses a data-driven approach, training on the large experimental solubility dataset BigSolDB, which contains 54,273 solubility measurements, 830 molecules, and 138 solvents. It leverages the fastprop library and mordred descriptors to engineer features for both the solute and the solvent, which, along with the temperature, are then passed into a neural network that predicts log10(Solubility)log_{10}(\text{Solubility}).

While HSP and many other empirical models merely classify whether a molecule is likely to soluble in a solvent, fastsolv can predict the actual solubility along with non-linear temperature effects and report the uncertainty in its predictions. Many experimental hours are devoted to determining the solubility curves of drug-like molecules, such as this paper on fenofibrate by Watterson et al., while fastolv can quickly provide such predictions across a variety of molecules and temperatures in less than a minute.

fastsolv predicted solubility of fenofibrate in common solvents showing increased solubility in aprotic solvents.

Figure 4: fastsolv predicted solubility of fenofibrate in common solvents showing increased solubility in aprotic solvents.

As seen in the experimental results, fenofibrate shows significantly higher solubility in polar aprotic solvents than in polar protic solvents, and a greater temperature dependence in acetonitrile than other aprotic solvents.

Solubility Prediction on Rowan

You can now run fastsolv solubility predictions on the Rowan platform. Rowan's solubility prediction tool built around fastsolv includes the following features.

Default and Custom Solvents

A default set of commonly used non-polar, polar aprotic, and polar protic solvents is pre-populated on Rowan's solvation GUI. This GUI also supports arbitrary solvent selection, making it easy to predict solubility in whatever solvents you care about.

Solvent selection component in Rowan's solubility workflow

Temperature Selection

To predict the temperature dependence of solubility, Rowan's solubility predictor automatically predicts solubility across a range of temperatures. On the GUI, you can select the start point and end point of the range as well as a number points to sample along.

Temperature selection component in Rowan's solubility workflow

Responsive Graphs of Temperature-Dependent Solubility

The results of Rowan's solubility prediction are displayed on our GUI using Plotly.js, a powerful and responsive client-side graphing library. You can view the uncertainty of each prediction, show or hide solvents by clicking on the legend, and download a PNG of the graph to share or reference later.

Predicted temperature-dependent solubility of fenofibrate in different solvents on the Rowan platform

API Access

For high-throughput computational needs or library-scale screening, Rowan provides access to the fastsolv model through our Python API.

To try Rowan's solubility prediction tool built around fastsolv, you can make a free account on our web-based computational platform. If you are interested in other solubility models or solubility-related features, we'd love to hear from you! You can reach us at contact@rowansci.com—we'd be happy to help you find the best solubility prediction method for your work.

Banner background image

What to Read Next

Studying Scaling in Electron-Affinity Predictions

Studying Scaling in Electron-Affinity Predictions

Testing low-cost computational methods to see if they get the expected scaling effects right.
Sep 10, 2025 · Corin Wagen
Open-Source Projects We Wish Existed

Open-Source Projects We Wish Existed

The lacunæ we've identified in computational chemistry and suggestions for future work.
Sep 9, 2025 · Corin Wagen, Jonathon Vandezande, and Ari Wagen
How to Make a Great Open-Source Scientific Project

How to Make a Great Open-Source Scientific Project

Guidelines for building great open-source scientific-software projects.
Sep 9, 2025 · Jonathon Vandezande
ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

the promise & peril of solubility prediction; our approach and models; pH-dependent solubility; testing NNPs for redox potentials; benchmarking opt. methods + NNPs; an FSM case study; intern farewell
Sep 5, 2025 · Eli Mann, Corin Wagen, and Ari Wagen
Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design.
Sep 5, 2025 · Elias L. Mann, Corin C. Wagen
What Isaiah and Sawyer Learned This Summer

What Isaiah and Sawyer Learned This Summer

Reflections from our other two interns on their time at Rowan and what they learned.
Sep 5, 2025 · Isaiah Sippel and Sawyer VanZanten
Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

We evaluate the ability of neural network potentials (NNPs) trained on OMol25 to predict experimental reduction-potential and electron-affinity values for a variety of main-group and organometallic species.
Sep 4, 2025 · Sawyer VanZanten, Corin C. Wagen
Which Optimizer Should You Use With NNPs?

Which Optimizer Should You Use With NNPs?

The results of optimizing 25 drug-like molecules with each combination of four optimizers (Sella, geomeTRIC, and ASE's implementations of FIRE and L-BFGS) and four NNPs (OrbMol, OMol25's eSEN Conserving Small, AIMNet2, and Egret-1) & GFN2-xTB.
Sep 4, 2025 · Ari Wagen and Corin Wagen
Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

finding transition states; the freezing-string method; using Rowan to find cool transition states; discussing drug design
Sep 3, 2025 · Jonathon Vandezande, Ari Wagen, Spencer Schneider, and Corin Wagen
The Invisible Work of Computer-Assisted Drug Design

The Invisible Work of Computer-Assisted Drug Design

Everything that happens before the actual designing of drugs, and how Rowan tries to help.
Aug 28, 2025 · Corin Wagen