Program

This course will be organized in 4 blocks over 14 weeks (start: week of 26.02.2024)

  1. Basic concepts in deep learning – 4 weeks
  2. DL in single-cell genomics – 3 weeks
  3. DL in protein bioinformatics - 4 weeks
  4. DL in image analysis – 3 weeks

The course format will comprise a weekly 90-minute online lecture and a weekly hybrid (in-person/online) practical Python session. Lectures will be given by teachers from all participating universities. Lectures and practical exercises on all three application areas will be centered around one recent publication illustrating a specific application and method.

The course will end with a 2-day workshop and hackathon meeting in Heidelberg in June 2024 during which students will be able to implement a short project and listen to scientific lectures.

Prerequisites

Students attending this course are expected to have some basic statistics knowledge and machine-learning fundamentals. Recommended books are among others:

  1. Deep Learning book by Goodfellow, Bengio, Courville
  2. The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
  3. An Introduction to Statistical Learning by Hastie, Tibshirani, Friedman (a simpler version of the previous book)
  4. Machine learning with PyTorch and scikit-learn by Raschka, Liu, Mirjalili (a great introduction into the technical aspects of DL in pyTorch).

As the practical sessions will be mostly based on Python and pyTorch, some basic knowledge in python is required (see reference [4] for a good overview of pyTorch for example).

Specifically, we expect that the following theoretical concepts are familiar:

basic statistics

  • accuracy
  • sensitivity/specificity
  • area under the curve (AUC)
  • probability distributions
  • random variable
  • expectation of a random variable

machine-learning

  • overfitting vs. underfitting
  • cross-validation
  • usage of training, validation and testing datasets
  • classification vs. regression (supervised vs. unsupervised)
  • binary vs. multi-class classification
  • standard ML algorithms such as Random Forest

mathematical foundations

  • matrix algebra

Schedule of lectures

Zoom link to weekly online lectures

Date Title Speaker Content Links
Week 1 - 26.02 Intro and Mathematical foundation to DL Bartek Wilczynski (Warsaw) Lecture materials, Practical session , Video recording (26.03)
Week 2 - 04.03 Convolutional and Recurrent neural networks Marco Frasca (Milano) Shift-invariance problem, convolution operation, convolutional layers, distributed convolution, pooling, padding. Sketch of some well-known convolutional architectures. Temporally related inputs, recurrent neaural networks, long short term memories. Lecture materials, Practical session ,Video recording (4.03)
Week 3 - 11.03 Autoencoders and variational autoencoders Carl Herrmann (Heidelberg) Embeddings, Architecture of autoencoders, Sparse/denoising autoencoder, Principle of variational autoencoders, Some statistical notions (reparametrization trick, etc...) Lecture materials, Practical session , Video recording
Week 4 - 18.03 Attention mechanisms and transformers Dario Malchiodi (Milano) The lecture will analyze transformer models, considering NLP as an application field. Focusing on sequence prediction, the main ingredients of a transformer will be introduced and analyzed. In particular, the attention mechanism will be explained in all the variations used in a transformer (namely, self-attention, masked attention, and multi-head attention). Subsequently, the training process of a transformer will be considered, focusing on self-supervision, fine-tuning, and task-specific training components. Although the main content of the lecture will consider the encoder-decoder architecture, the variants encompassing only an encoder and only a decoder will be introduced. Finally, some extensions of the transformer architecture going beyond NLP will be briefly touched. Lecture materials, Practical session , Video recording
Week 5 - 08.04 Transformers and RNN for sequence analysis Dario Malchiodi (Milano) Lecture materials, Practical session Video recording (first 10 minutes missing)
Week 6 - 15.04 Models for multimodal data integration Britta Velten (Heidelberg) This lecture will provide an overview of the basic statistical concepts that are important for the joint analysis of multi-modal data. We will discuss statistical properties of multi-omics data and challenges, followed by an overview on different strategies for supervised and unsupervised integration of multi-omics data and a deep dive into MOFA as example for an unsupervised method. We will discuss the underlying probabilistic model and explore different downstream analyses that can help to interpret the results of the method. We will illustrate the method on case studies. Lecture materials, Practical session Video recording
Week 7 - 22.04 VAE in single-cell genomics Carl Herrmann (Heidelberg) In this lecture, I will review recent applications of AE and VAE in the field of genomics, in particular single-cell genomics. We will see how these application can help perform clustering of cell populations and allow to denoise sparse data. Finally, I will present some recent VAE models which are interpretable, i.e. in which the neurons of the model can be interpreted as biological entities. For those not familiar with gemomics, I will start with a brief review of some concepts and data types. Lecture materials, Practical session Video recording
Week 8 - 29.04 AlphaFold, EMSFold to predict structure of proteins Joanna Sulkowska (Warsaw) Lecture materials, Practical session Video recording
Week 9 - 06.05 RNN, CNN models for topology/graph analysis in biopolymers Joanna Sulkowska (Warsaw) Lecture materialsLink to Video
Week 10 - 13.05 Deep learning models for protein-ligand binding site prediction David Hoksza (Prague) We will introduce methods for protein-small molecule binding site prediction focusing on the structure-based methods. We will go along the timeline by introducing traditional non-ML-based methods, followed by methods using traditional ML techniques, such as Random Forests, and continuing with deep-learning approaches, such as CNN, and conclude with the new kid on the block, protein language models. Lecture materials, notebookVideo recording
Week 11 - 22.05 (Beware, unusual date!) Diffusion models for protein design Elodie Laine (Paris) Unlock the potential of generative models based on diffusion for de novo protein design. Explore their flexibility in shaping proteins according to desired shapes, functions, and active sites. This course provides a historical overview, operational insights, and showcases recent applications. Delve into challenges in discrete vs. continuous spaces and conclude with a comparative analysis against simpler generative models.
Week 12 - 27.05 Intro to BioImage Analysis and Deep Learning Utilization Martin Schatz (Prague) Introduction in the field of BioImage Analysis with an Deep Learning Utilization focus. We will discuss challanges of BioImage Analysis, you will learn about Community-Driven Resource not only for an Accessible Deep Learning. The hands on part will focus on knowlege and experience for utilization of Noise2Void and Stardist deep learning models.
Week 13 - 03.06 Deep Architectures for sampling macromolecules Grégoire Sergeant-Perthuis (Paris)
Week 14 - 10.06 Deep learning for image segmentation Karl Rohr (Heidelberg) The lecture introduces deep learning methods for image segmentation. The focus is on Convolutional Neural Networks (CNNs) for image analysis and encoder-decoder network architectures for image segmentation. We will discuss the well-known networks U-Net and Cellpose, and their application for computer-based analysis of cell microscopy image data.
14-16.06 Final meeting Heidelberg The course will end with a 2-day workshop and hackathon meeting in Heidelberg during which students will be able to implement a short project and listen to scientific lectures.
Feel free to post your project ideas here!