FYS-8604 | UiT

Type of course

The course can be taken as a singular course.

Registration is open for UiT students and members of NORA Research School.

It will be conducted as a concentrated course in the style of summer school, 10th -14th of June 2025 in Høgskolen i Østfold, Fredrikstad.

Admission requirements

PhD students or holders of a Norwegian master´s degree of five years (300 ECTS) or 3 (180 ECTS) + 2 years (120 ECTS) or equivalent may be admitted.PhD students must upload a document from their university stating that there are registered PhD students. This group of applicants does not have to prove English proficiency and are exempt from semester fee. Holders of a Master´s degree must upload a Master´s Diploma with Diploma Supplement /English PhD students at UiT The Arctic University of Norway register for the course through StudentWeb. External applicants apply for admission through SøknadsWeb. All external applicants have to attach a confirmation of their status as a PhD student from their home institution. Students who hold a Master of Science degree, but are not yet enrolled as a PhD-student have to attach a copy of their master’s degree diploma. These students are also required to pay the semester fee.

Recommended prerequisites: Programming skills in python and hands on knowledge of python programming for deep learning.

Application code: 9317, application deadline: May 1st

The course is limited to 40 places. Qualified applicants are ranked on the basis of a lottery if there are more applicants than available places.

Course content

This course will provide both a fundamental understanding of the techniques and methods used in multi-modal learning and the most recent innovations within the field. The course will cover multi-modal learning in the context of important data domains such as image, text, and time series data, and give the students a deeper understanding of the theory that underpins current multi-modal methods. The course will consists of 5 days of teaching with both lectures and practical components.

A take-home exam will be given that will be used for evaluation.

Core concepts

History of multi-modal learning
Motivation and fundamental concepts
Encoders, data fusion, and loss functions

Discriminative multi-modal learning

Classification-based methods
Clustering-based methods

Self-supervised multi-modal learning

Contrastive methods
Non-contrastive methods
Masking-based methods
Data augmentation

Generative multi-modal learning theory

Auto-regressive methods
Predictive methods

Advanced multi-modal learning theory

On the design of multi-modal loss functions
Understanding data fusion multi-modal learning
Nosiy and missing data in multi-modal learning

Relevance of course in program of study: Fusing information from multiple modalities is a fundamental challenge in machine learning. However, recent works have achieved impressive performance on a wide range of different tasks involving modalities from variety of different source such as images, time series, and text. Recent multi-modal approaches heavily rely on deep neural networks to process various different data types and fuse them together to a common representation that contains the complementary information found in the different sources. Performing this fusion in a reliable and precise manner is key to achieve good performance. Furthermore, the added complexity of having multiple neural networks for different modalities exacerbates already existing neural network-related challenges such as a lack of explainability and a limited capability for uncertainty modeling.

Objectives of the course

Knowledge - The student is able to

Describe advanced multi-modal techniques
Describe the role of domain-knowledge in multi-modal learning
Describe the development of multi-modal learning
Discuss recent development in the field
Discuss advanced multi-modal learning in an applied setting.

Skills - The student is able to

Explain the fundamental ideas behind multi-modal learning
Apply the learned material to new applications or problem settings
Use multi-modal learning for research and industrial settings using software libraries such as e.g. Pytorch or TensorFlow
Make appropriate method and architecture choices for a given application or problem setting

General competence - The student is able to

Give an interpretation of recent developments and provide an intuition of the open questions in the field of multi-modal learning
Show an understanding of why multi-modal learning has shown great improvements over the last couple of years

Language of instruction and examination

The language of instruction is English, and all the syllabus material is in English.The final report needs to be submitted in English.

Teaching methods

Lectures: 20 hours

Self-study sessions: 40 hours

Project work: spread over 8 weeks - net time 50 hours

Hands-on sessions: 10 hours

Net effort: ( 120 hours)

Final exam

Emnet legges ned og siste mulighet til å avlegge eksamen etter dette semesteret, er vår 2026

Her finner du mer informasjon om eksamen i nedlagte emner

Examination

Examination:	Duration:	Grade scale:
Off campus exam	8 Weeks	Passed / Not Passed

UiT Exams homepage

Re-sit examination

There will not be given a re-sit exam for this course.

spring 2025 FYS-8604 NORA research school on multi-modal learning - 5 ECTS