SONAR|HES-SO

SONAR|HES-SO

SONAR|HES-SO regroupe les travaux de bachelor et master diffusables de plusieurs écoles de la HES-SO. Consultez cette page pour le détails.

En cas de question, merci de contacter les bibliothécaires de la HES-SO : bibliotheques(at)hes-so.ch

Research report

Learning to generate molecules : using the diffusion model to drug discovery

  • Genève : Haute école de gestion de Genève

68 p.

English We begin by discussing the context of drug development and generative models. We then set out the objectives and three research questions relating to the effectiveness of diffusion models and the constraints of real-life applications. Next, the organization of the research project and its sources of bias, schedule and budget.
We then take a look at the state of the art of diffusion models, to define them, describe how they work and the processes that make them up. We review their different architectures: Denoising Diffusion Probabilistic Models (DDPMs), Noise Conditioned Score Networks (NCSNs) and Stochastic Differential Equations (SDEs). The conditional mode is tested in an initial experiment using a DDPM and images.
We then discuss molecular design, de novo design and problem statements such as unconstrained, property-constrained and structure-constrained molecular generation. We review different molecular representations such as SMILES, SELFIES, molecular fingerprints and graphical notations. We then discuss the different generative models used for molecular generation, such as RNNs, VAEs, GANs and diffusion models. We look at molecular databases and frameworks for evaluating generative models.
We describe our thinking process, the development of our DiffGenMol application and its functionalities, and the metrics and visualizations implemented. This application is then used to carry out two experiments.
The first experiment aims to find out which molecular representation allows DDPM to be the most efficient for unconstrained molecular generation.
The aim of the second experiment is to find out which molecular representation allows DDPM to be most effective in generating property-constrained molecules.
For each experiment, we describe the objectives, the method used, the evolution of the metrics during model training, the results and their interpretation.
We then discuss the approach taken in this work, the difficulties encountered and the choices made. We then answer the three research questions with the help of the results obtained and our various readings.
Finally, we conclude with the most relevant results, possible improvements and a personal impression of how this research work has gone.
Keywords: generative models, diffusion models, DDPM, conditional generation, molecular generation, molecular representations, SMILES, SELFIES.
Language
  • English
Classification
Information, communication and media sciences
Notes
  • Haute école de gestion Genève
  • Information documentaire
  • hesso:hegge
Persistent URL
https://sonar.ch/hesso/documents/330945
Statistics

Document views: 11 File downloads:
  • CHARBONNIER_CLERC_projet_recherche_2024.pdf: 39