Prediction of compound solubility in Dimethyl sulfoxide using machine learning methods including
graph neural networks
Abstract
In drug discovery, compounds that are insoluble in Dimethyl sulfoxide (DMSO)
are not wanted and can be disregarded. To avoid wasting time and resources
pharmaceutical companies are trying to predict compound solubility before selecting
compounds for further research. Compound solubility is hard to predict
with confidence and this project focus on prediction using machine learning
methods. The used dataset consists of almost 12 thousand compounds label
soluble or insoluble and is very label biased towards soluble compounds. Different
ways of representing compounds are tested with the four machine learning
methods: Support Vector Machine, Random Forest, Multilayer Perceptron
and a state-of-the-art graph convolution neural network called Directed Message
Passing Neural Network. After performing a 5-fold cross-validation, it
can be concluded that a Directed Message Passing Neural Network performs
better than the other machine learning methods when they are trained with
classical compound representations and on par when they are trained with the
latent space descriptors, Section 2.1.2. Finally, with an external experiment,
it is shown that the best Directed Message Passing Neural Network is able to
significantly increase the occurrence of found insoluble compounds compared
to a random selection.
|