Program

This course will be organized in 3 blocks over 8 weeks (start: week of 24.02.2025)

  1. DL in single-cell genomics – 3 weeks
  2. DL in protein bioinformatics - 4 weeks
  3. DL in image analysis – 3 weeks

The course format will comprise a weekly 60-minute online lecture and a weekly hybrid (in-person/online) practical Python session. Lectures will be given by teachers from all participating universities. Lectures and practical exercises on all three application areas will be centered around one recent publication illustrating a specific application and method.

The course will end with a 2-day workshop and hackathon meeting in Heidelberg on May 30/31-June 1st 2025 during which students will be able to implement a short project and listen to scientific lectures.

Prerequisites

Students attending this course are expected to have some basic statistics knowledge and machine-learning fundamentals. You can use the lecture material from last year’s edition, in particular the four introductory lectures:

Date Title Speaker Links
Intro Lecture 1 Intro and Mathematical foundation to DL Bartek Wilczynski (Warsaw) Lecture materials, Practical session , Video recording (26.03)
Intro Lecture 2 Convolutional and Recurrent neural networks Marco Frasca (Milano) Lecture materials, Practical session ,Video recording (4.03)
Intro Lecture 3 Autoencoders and variational autoencoders Carl Herrmann (Heidelberg) Lecture materials, Practical session , Video recording
Intro Lecture 4 Attention mechanisms and transformers Dario Malchiodi (Milano) Lecture materials, Practical session , Video recording

Recommended books are among others:

  1. Deep Learning book by Goodfellow, Bengio, Courville
  2. The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
  3. An Introduction to Statistical Learning by Hastie, Tibshirani, Friedman (a simpler version of the previous book)
  4. Machine learning with PyTorch and scikit-learn by Raschka, Liu, Mirjalili (a great introduction into the technical aspects of DL in pyTorch).

As the practical sessions will be mostly based on Python and pyTorch, some basic knowledge in python is required (see reference [4] for a good overview of pyTorch for example).

Specifically, we expect that the following theoretical concepts are familiar:

basic statistics

  • accuracy
  • sensitivity/specificity
  • area under the curve (AUC)
  • probability distributions
  • random variable
  • expectation of a random variable

machine-learning

  • overfitting vs. underfitting
  • cross-validation
  • usage of training, validation and testing datasets
  • classification vs. regression (supervised vs. unsupervised)
  • binary vs. multi-class classification
  • standard ML algorithms such as Random Forest

mathematical foundations

  • matrix algebra

Preliminary schedule of lectures

Date Title Speaker Content Links
Week 1 - 24.02 Models for multimodal data integration Britta Velten (Heidelberg)
Week 2 - 3.03 VAE in single-cell genomics Carl Herrmann (Heidelberg)
Week 3 - 10.03 Deep learning for predicting non-coding DNA activity Bartek Wilczynski (Warsaw)
Week 4 - 17.03 AlphaFold, EMSFold to predict structure of proteins Joanna Sulkowska (Warsaw)
Week 5 - 24.03 Diffusion models for protein design Elodie Laine (Paris)
Week 6 - 31.03 Deep Architectures for sampling macromolecules Grégoire Sergeant-Perthuis (Paris)
Week 7 - 7.04 Intro to BioImage Analysis and Deep Learning Utilization Martin Schatz (Prague)
Week 8 - 14.04 Deep learning for image segmentation Karl Rohr (Heidelberg)
14.04 - 31.05 Project phase
30.05-1.06 Final meeting Heidelberg The course will end with a 2-day workshop and hackathon meeting in Heidelberg during which students will be able to implement a short project and listen to scientific lectures.