| Skriv ut | Lukk vindu |
Vår 2025
FYS-8604 NORA research school on multi-modal learning - 5 stp
The course is administrated by
Type of course
The course can be taken as a singular course.
Registration is open for UiT students and members of NORA Research School.
It will be conducted as a concentrated course in the style of summer school, 10th -14th of June 2025 in Høgskolen i Østfold, Fredrikstad.
Course contents
This course will provide both a fundamental understanding of the techniques and methods used in multi-modal learning and the most recent innovations within the field. The course will cover multi-modal learning in the context of important data domains such as image, text, and time series data, and give the students a deeper understanding of the theory that underpins current multi-modal methods. The course will consists of 5 days of teaching with both lectures and practical components.
A take-home exam will be given that will be used for evaluation.
Core concepts
- History of multi-modal learning
- Motivation and fundamental concepts
- Encoders, data fusion, and loss functions
Discriminative multi-modal learning
- Classification-based methods
- Clustering-based methods
Self-supervised multi-modal learning
- Contrastive methods
- Non-contrastive methods
- Masking-based methods
- Data augmentation
Generative multi-modal learning theory
- Auto-regressive methods
- Predictive methods
Advanced multi-modal learning theory
- On the design of multi-modal loss functions
- Understanding data fusion multi-modal learning
- Nosiy and missing data in multi-modal learning
Relevance of course in program of study: Fusing information from multiple modalities is a fundamental challenge in machine learning. However, recent works have achieved impressive performance on a wide range of different tasks involving modalities from variety of different source such as images, time series, and text. Recent multi-modal approaches heavily rely on deep neural networks to process various different data types and fuse them together to a common representation that contains the complementary information found in the different sources. Performing this fusion in a reliable and precise manner is key to achieve good performance. Furthermore, the added complexity of having multiple neural networks for different modalities exacerbates already existing neural network-related challenges such as a lack of explainability and a limited capability for uncertainty modeling.
Admission requirements
PhD students or holders of a Norwegian master´s degree of five years (300 ECTS) or 3 (180 ECTS) + 2 years (120 ECTS) or equivalent may be admitted.PhD students must upload a document from their university stating that there are registered PhD students. This group of applicants does not have to prove English proficiency and are exempt from semester fee. Holders of a Master´s degree must upload a Master´s Diploma with Diploma Supplement /English PhD students at UiT The Arctic University of Norway register for the course through StudentWeb. External applicants apply for admission through SøknadsWeb. All external applicants have to attach a confirmation of their status as a PhD student from their home institution. Students who hold a Master of Science degree, but are not yet enrolled as a PhD-student have to attach a copy of their master’s degree diploma. These students are also required to pay the semester fee.
Recommended prerequisites: Programming skills in python and hands on knowledge of python programming for deep learning.
Application code: 9317, application deadline: May 1st
The course is limited to 40 places. Qualified applicants are ranked on the basis of a lottery if there are more applicants than available places.
Objective of the course
Knowledge - The student is able to
- Describe advanced multi-modal techniques
- Describe the role of domain-knowledge in multi-modal learning
- Describe the development of multi-modal learning
- Discuss recent development in the field
- Discuss advanced multi-modal learning in an applied setting.
Skills - The student is able to
- Explain the fundamental ideas behind multi-modal learning
- Apply the learned material to new applications or problem settings
- Use multi-modal learning for research and industrial settings using software libraries such as e.g. Pytorch or TensorFlow
- Make appropriate method and architecture choices for a given application or problem setting
General competence - The student is able to
- Give an interpretation of recent developments and provide an intuition of the open questions in the field of multi-modal learning
- Show an understanding of why multi-modal learning has shown great improvements over the last couple of years
Language of instruction
Teaching methods
Lectures: 20 hours
Self-study sessions: 40 hours
Project work: spread over 8 weeks - net time 50 hours
Hands-on sessions: 10 hours
Net effort: ( 120 hours)