English
We begin by discussing the context of drug development and generative models. We then set out the objectives and three research questions relating to the effectiveness of diffusion models and the constraints of real-life applications. Next, the organization of the research project and its sources of bias, schedule and budget. We then take a look at the state of the art of diffusion models, to define them, describe how they work and the processes that make them up. We review their different architectures: Denoising Diffusion Probabilistic Models (DDPMs), Noise Conditioned Score Networks (NCSNs) and Stochastic Differential Equations (SDEs). The conditional mode is tested in an initial experiment using a DDPM and images. We then discuss molecular design, de novo design and problem statements such as unconstrained, property-constrained and structure-constrained molecular generation. We review different molecular representations such as SMILES, SELFIES, molecular fingerprints and graphical notations. We then discuss the different generative models used for molecular generation, such as RNNs, VAEs, GANs and diffusion models. We look at molecular databases and frameworks for evaluating generative models. We describe our thinking process, the development of our DiffGenMol application and its functionalities, and the metrics and visualizations implemented. This application is then used to carry out two experiments. The first experiment aims to find out which molecular representation allows DDPM to be the most efficient for unconstrained molecular generation. The aim of the second experiment is to find out which molecular representation allows DDPM to be most effective in generating property-constrained molecules. For each experiment, we describe the objectives, the method used, the evolution of the metrics during model training, the results and their interpretation. We then discuss the approach taken in this work, the difficulties encountered and the choices made. We then answer the three research questions with the help of the results obtained and our various readings. Finally, we conclude with the most relevant results, possible improvements and a personal impression of how this research work has gone. Keywords: generative models, diffusion models, DDPM, conditional generation, molecular generation, molecular representations, SMILES, SELFIES.