DeepLife - Program

Program

This course will be organized in 3 blocks over 9 weeks (start: week of 24.02.2025)

DL in single-cell genomics – 3 weeks
DL in protein bioinformatics - 4 weeks
DL in image analysis – 2 weeks

Note that lectures will start at 5pm sharp; from 4.30pm until 5pm, there will be a debrief of the exercise sheet of the previous week!

The course format will comprise a weekly 60-minute online lecture and a weekly hybrid (in-person/online) practical Python session. Lectures will be given by teachers from all participating universities. Lectures and practical exercises on all three application areas will be centered around one recent publication illustrating a specific application and method.

The course will end with a 2-day workshop and hackathon meeting in Heidelberg on May 30/31-June 1st 2025 during which students will be able to implement a short project and listen to scientific lectures.

Prerequisites

Students attending this course are expected to have some basic statistics knowledge and machine-learning fundamentals. You can use the lecture material from last year’s edition, in particular the four introductory lectures:

Date	Title	Speaker	Links
Intro Lecture 1	Intro and Mathematical foundation to DL	Bartek Wilczynski (Warsaw)	Lecture materials, Practical session , Video recording (26.03)
Intro Lecture 2	Convolutional and Recurrent neural networks	Marco Frasca (Milano)	Lecture materials, Practical session ,Video recording (4.03)
Intro Lecture 3	Autoencoders and variational autoencoders	Carl Herrmann (Heidelberg)	Lecture materials, Practical session , Video recording
Intro Lecture 4	Attention mechanisms and transformers	Dario Malchiodi (Milano)	Lecture materials, Practical session , Video recording

Recommended books are among others:

Deep Learning book by Goodfellow, Bengio, Courville
The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
An Introduction to Statistical Learning by Hastie, Tibshirani, Friedman (a simpler version of the previous book)
Machine learning with PyTorch and scikit-learn by Raschka, Liu, Mirjalili (a great introduction into the technical aspects of DL in pyTorch).

As the practical sessions will be mostly based on Python and pyTorch, some basic knowledge in python is required (see reference [4] for a good overview of pyTorch for example).

Specifically, we expect that the following theoretical concepts are familiar:

basic statistics

accuracy
sensitivity/specificity
area under the curve (AUC)
probability distributions
random variable
expectation of a random variable

machine-learning

overfitting vs. underfitting
cross-validation
usage of training, validation and testing datasets
classification vs. regression (supervised vs. unsupervised)
binary vs. multi-class classification
standard ML algorithms such as Random Forest

mathematical foundations

matrix algebra

Preliminary schedule of lectures

Link to the weekly Zoom lecture (Note: you will get the passphrase to join the meeting from your local instructor!)

Note that lectures start at 5pm; from 4.30 until 5pm, there will be a short debrief of the exercise sheet of the previous week!

Date	Title	Speaker	Content	Links
Week 1 - 24.02	Models for multimodal data integration	Britta Velten (Heidelberg)	This lecture will provide an overview of the basic statistical concepts that are important for the joint analysis of multi-modal omics data with a focus on probabilistic models for data integration. We will discuss statistical properties of multi-omics data and the resulting challenges for the data analysis, followed by an overview on different strategies for both supervised and unsupervised integration. Taking MOFA as example for an unsupervised method we will discuss the properties of probabilistic factor models for joint dimension reduction of multiple omics data sets. We will also discuss avenues to account for multiple sample groups and omics data with temporal and spatial resolution in probabilistic models.	Slides, notebooks and recording
Week 2 - 3.03	VAE in single-cell genomics	Carl Herrmann (Heidelberg)	In this lecture, I will review recent applications of AE and VAE in the field of genomics, in particular single-cell genomics. We will see how these application can help perform clustering of cell populations and allow to denoise sparse data. Finally, I will present some recent VAE models which are interpretable, i.e. in which the neurons of the model can be interpreted as biological entities. For those not familiar with gemomics, I will start with a brief review of some concepts and data types.	Slides, notebooks and recording
Week 3 - 10.03	Deep learning for predicting non-coding DNA activity	Bartek Wilczynski (Warsaw)	We know that the transcriptional gene regulation in multicellular organisms depends on the action of hundreds of thousands non-coding regulatory sequences scattered in the genome. Since there are so many of them, and we ususally cannot assess their activity directly in the cells and tissues, annotating their activity experimentally in full is difficult. If we simplify their function into activation of transcription in different cellular contexts, the task of annotation becomes similar to a classical ML problem of multi-class classification. Recently, many studies have been published attempting to solve this problem to different degrees using Convolutional Neural Networks. We will discuss a few recent such papers and discuss their successes as well as some pitfalls of training classifier models without clearly defined negative examples.	Slides, notebooks and recording

Week 4 - 17.03	AlphaFold, EMSFold to predict structure of proteins	Joanna Sulkowska (Warsaw)		Slides, notebooks and recording
Week 5 - 24.03	Protein design in the deep learning era, from inverse folding to diffusion models	Elodie Laine (Paris)	The revolution in protein structure prediction has boosted the development of deep learning-based methods for designing de novo protein with desired properties. This session will introduce the modern computational pipeline for protein design, from generating novel protein folds to designing amino acid sequences compatible with them. We will cover a wide range of deep learning architectures and frameworks, including graph neural networks and diffusion models. We will discuss a few recent groundbreaking applications that were unimaginable just a few years ago.	Slides and notebook
Week 6 - 31.03	Deep Architectures for sampling macromolecules	Grégoire Sergeant-Perthuis (Paris)		Slides and notebooks
Week 7 - 7.04	Deep learning models for protein-ligand binding site prediction	David Hoksza (Prague)	We will introduce methods for predicting protein-small molecule binding sites, with a focus on structure-based approaches. We will briefly cover traditional non-ML and ML methods, followed by deep learning techniques such as convolutional neural networks, graph neural networks, and the latest addition to the field—protein language models.	Slides, notebooks and recording

Week 8 - 14.04	Deep learning for image segmentation	Karl Rohr (Heidelberg)	The lecture introduces deep learning methods for image segmentation. The focus is on Convolutional Neural Networks (CNNs) and encoder-decoder network architectures for cell segmentation. We will discuss the well-known networks U-Net and Cellpose, and their application for computer-based analysis of cell microscopy image data.	Slides, notebooks and recordings
Week 9 - 28.04	Intro to BioImage Analysis and Deep Learning Utilization	Martin Schatz (Prague)		Slides, notebooks, recordings

14.04 - 31.05	Project phase

30.05-1.06	Final meeting Heidelberg		The course will end with a 2-day workshop and hackathon meeting in Heidelberg during which students will be able to implement a short project and listen to scientific lectures.