Machine Learning in Drug Discovery and Design

Tags: Final year project, machine learning, mathematics, python, scikit-learn, statistics

Predicting the Blood Brain Barrier Penetration of Drugs


Drug design and discovery is a very expensive process and lots of new compounds
are being developed rapidly. Only roughly 2% of Central Nervous System (CNS) drugs can pass through
the blood-brain barrier, this presents a problem in CNS drug development.

This Project aims to develop a solution that can predict with high confidence, the
probability of a drug passing through this blood brain barrier in hopes that this
can speed up the process of developing a CNS drug.


The end product is a REST API where a URL call can be made to and prediction results returned. A sample curl call is shown below and the prediction result returned as a JSON is also shown in the next section

curl -H "Content-Type: application/json" -X POST -d ’{"smile":"Cn1c2CCC(Cn3ccnc3C)C(=O)c2c4ccccc14"}’ http://localhost:5000/api/prediction 

Sample Drugs in Training data


Feature Engineering


About me

Technologies used

  • Python
  • Scikit-Learn
  • Numpy
  • Pandas
  • Flask