Theory of aqueous solubility prediction

This page summarizes the theoretical background behind ChemAxon's Aqeous Solubility (logS) Predictor. To find more information on the technical/usage side of the predictor, see the following page.


Aqueous solubility is one of the most important physico-chemical properties in modern drug discovery. It has impact on ADME-related properties like drug uptake, distribution and even oral bioavailability. Solubility can also be a relevant descriptor for property-based computational screening methods in the drug discovery process. Hence there is a significant interest in fast, reliable, structure-based methods for predicting solubility in water for promising drug candidates.

ChemAxon's Solubility Predictor is able to calculate two types of solubility: intrinsic and pH-dependent solubility.

On the logS unit

The logS is a common unit for measuring solubility. This unit is the 10-based logarithm of the solubility measured in mol/l unit, that is logS = log (solubility measured in mol/l).

On the temperature of the solubility prediction

The Solubility Predictor predicts solubility values at 25 °C.

On the result of the prediction

The predictor can provide quantitative results, giving the solubility in logS, mg/mL or mol/L units. The predictive accuracy of the plugin is considered to be 1 logS unit. In case only an estimation about how well soluble the compound is needed, the plugin can give a solubility category as a qualitative measure.

Intrinsic solubility

The intrinsic solubility (usually denoted as logS0) of an ionizable compound is the solubility that can be measured after the equilibrium of solvation between the dissolved and the solid state is reached at a pH where the compound is fully neutral.


The intrinsic solubility of phenol can be measured at pH 6. Phenol is a weak acid with a pKa value of 10.02, which means that at pH 6 the molecule will be present in its neutral form, and the equilibrium can be measured only between the solid and the dissolved neutral form.


Fig. 1. Solvation equilibrium of phenol at pH 6

Our predictor uses a fragment-based method that identifies different structural fragments in the molecule and assigns an intrinsic solubility contribution to them. The contributions then are summed up to get the intrinsic solubility value. The implementation is based on the article of Hou et al.

The figure below shows a molecule split up into fragments that are used in the intrinsic solubility prediction.


Fig. 2. A molecule split into fragments to predict its intrinsic solubility

pH-dependent solubility

The pH of the solution determines the ionization of the dissolved compound, which greatly affects the solvation equilibrium. With increasing ionization solubility increases compared to the intrinsic solubility.


The solubility of aniline in an acidic environment will be greater than its intrinsic solubility as the protonation of the compound shifts the equilibrium between the pure liquid aniline and its dissolved form to the right.


Fig. 3. Solvation equilibrium of aniline in an acidic environment

The pH-dependent solubility (usually denoted as logSpH) can be derived from the Henderson-Hasselbalch equation and the above definition of intrinsic and pH-dependent solubility.

In case of a weak acid the formula is the following:

(mathjax-inline(\log{S_{pH}} = \log{{S_0}} + \log(1 + 10^{(pH-pKa)}))mathjax-inline)

Considering a general case (based on the derivations for mono- and diprotic acids and bases and ampholytes) this formula can be transformed into the following form:

(mathjax-inline(\log S_{pH} = \log S_0 + \log(1 + \alpha))mathjax-inline) , where (mathjax-inline(\alpha = \frac{ \sum_i \alpha_{cA_i}} { \sum_j \alpha_{nA_j}})mathjax-inline)

In this formula (mathjax-inline(\alpha_{cA_i})mathjax-inline) is the % of distribution of the i-th charged microspecies at the given pH, while (mathjax-inline(\alpha_{nA_j})mathjax-inline) is the % of distribution of the j-th neutral microspecies at the given pH.

Cut-off of the pH-dependent solubility curve

To put practical limits to the pH-dependent solubility curve and describe the fact that the solution reaches a certain saturation, a cut-off is applied to better match the real (experimental) pH-dependent solubility curve.

In our logS Predictor we apply the following cut-off (the following solubility values are all expressed in logS unit):

  • if the predicted logS0 > -2, the applied cut-off will be +2, which means that the pH-dependent logS curve will be "cut off" at logS0 + 2. This means that the pH-dependent logS values won't increase above logS0 + 2.

  • if the predicted logS0 < -2, the predicted pH-dependent solubility curve will not be allowed to rise above 0, so the cut-off will be at 0.

Cut-off examples

  1. The predicted intrinsic solubility (to which value the pH-dependent logS curve converges) is -1.0. Therefore the pH-dependent curve is cut off at +1.0.


2. The predicted intrinsic solubility (to which the pH-dependent logS curve converges) is -3.0. Therefore the pH-dependent curve is cut off at 0.


Examples of predicted pH-dependent logS curves

In this section you will find some examples of predicted vs. experimental pH-dependent logS curves for specific drug molecules. The experimental values and the detailed analysis of the experimental logS curves can be found in the referenced literature.

Example 1

The predicted pH-dependent logS curve of the HCl salt of ticlopidine shows a good correlation with the experimental curve, meaning that the HH-equation quite accurately describes the pH-dependence of the solubility. This result also comes from that ticlopidine is a monoprotic base, so the HH-equation works well.

The predicted logS0 is -3.47, while the experimental logS0 is -4.25. The absolute difference between the experimental and predicted intrinsic solubility is 0.78 logS unit.


Example 2

The predicted pH-dependent logS curve of the hydrogen fumarate salt of the diprotic base quetiapine shows good correlation with the experimental curve. Its predicted logS0 is -4.27, while the experimental logS0 -2.84. The absolute difference between the experimental and the predicted instrinsic solubility is 1.42.


Example 3

The predicted pH-dependent logS curve of the hydrogen fumarate salt of the ampholytic desvenlafaxine shows difference to the experimental curve below pH 6, which is due to salt formation. The predicted intrinsic logS is -2.16, while the experimental is -3.25. The absolute difference between these two values is 1.09 logS unit.


Results of testing the model

The accuracy of the model was tested using fragment contributions calculated from the training set with a linear regression model. The obtained contributions were then used for calculating solubility for the training and the test set. The results are summarised on the following two charts:

images/ images/download/attachments/1806933/logs_2.png

Tests for pH-logS profile were also run. The two plots below show calculated and experimental pH-logS profiles for different acidic, basic and zwitter-ionic compounds:


Future goals

The Solubility Predictor will be developed further in the future. Among our future goals we have extending the prediction with a descriptor-based method and adding training features.


1. Hou, T. J.; Xia, K.; Zhang, W.; Xu, X. J. ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribuition Approach. J. Chem. Inf. Comput. Sci. 2004, 44, 266-275

2. Völgyi, G.; Baka, E. et al. Study of pH-dependent solubility of organic basis. Revisit of the Henderson-Hasselbalch relationship, Analytica Chimica Acta, 2010, 673, 40-46

3. Avdeef, A. et al. Equilibrium solubility measurement of ionizable drugs - consensus recommendations for improving data quality, ADMET & DMPK 4(2), 2016, 117-178

4. Shoghi, E.; Fuguet, E.; Bosch, E.; Rafols, C. Solubility-pH profiles of some acidic, basic and amphoteric drugs, European Journal of Pharmaceutical Sciences 2013, 48, 291-300