D4 - Deep Drug Discovery and Deployment (PI: Bernardete Ribeiro; co-PI: Joel P. Arrais)


The traditional drug discovery process may take up to 15 years from conceptualization to market with a cost that can reach one thousand million, without any warranties that the identified compounds will reach the market. The first three stages, namely target identification, lead discovery, and lead optimization, may take 4 to 7 years alone. This is mainly a data-driven process that starts with all human proteins that can be used as putative targets, the millions of lead compounds that need to be evaluated and, for the final candidates, a massive number of structural variants to be tested. D4 is a joint venture to address the challenge of using computational methods to improve the pipeline for drug discovery and deployment. D4 proposes the use of state-of-the-art Deep Learning methods to tackle the challenges identified on each of the initial stages of the drug discovery pipeline. Deep networks were proven to be more effective than shallow architectures to face complex problems like speech or image recognition. In addition, deep architectures are able to amplify key discriminative aspects from the input data while suppressing irrelevant information, thus attaining improved accuracy. D4 will explore these advantages. For instance, target identification can benefit from the improved use of methods for representation learning instead of relying on manual feature engineering. Restricted Boltzmann Machines rely on the correct evaluation of features that better represent the small variations in protein structure, which can be potentially used in the lead identification stage. The lead optimization stage can capitalize from the iterative refinement of methodologies that comply with low starting data such as Zero-Shot Learning. The project team brings together researchers from University of Coimbra (UC) and University of Aveiro (UA). These researchers combine a unique set of expertise in Machine Learning, with focus on Pattern Recognition, Deep Learning and Differential Geometry, and on Computational Biology with emphasis on the prediction of protein and drug interactions. This project also includes one industrial partner, BSIM2, which will contribute to shape the project requirements, in addition to help explore the market viability of research results. BSIM2 is particularly interested on a drug discovery programme that targets transthyretin-related amyloid diseases. This will be used as case study to validate the proposed methodologies. The main contribution of this project is the creation of an improved computational pipeline that uses Deep Learning architectures to support the drug discovery process. The pipeline will be implemented within a framework that will be available to the community. Both the final platform and the computational methods will be validated with the close collaboration of the industrial partner, which will apply it to develop novel therapeutics for neurodegenerative amyloid diseases.


Funded by



Universidade de Aveiro; BSIM2

Total budget

239 796,00 €

Local budget

175 746,00 €


Deep Learning, Drug Discovery; Bioinformatics

Start Date


End Date