Open-source tools to extract multimodal human behavioural features from audio and video data

Federico · 30/09/2025

This online module provides hands-on training in the use of state-of-the-art open-source tools for extracting features of human behaviour. Through tutorials with screen recordings, participants will learn how to apply OpenFace, OpenPose, and openSMILE for behavioural data analysis. The module also includes lectures on multimodal analysis of the extracted data, offering both theoretical background and practical guidance.

Target audience

Postgraduate students and professionals

Prerequisites

Basic programming skills, Python, Bash scripts, Docker

Intended learning outcomes

Analysis of Multimodal Input for Emotion Recognition: Students will be able to use state-of-the-art tools to extract features of human behaviour.

Teaching and learning methods

Teaching and learning methods include tutorials on open-source tools with lectures introducing multimodal data analysis

Assessment

Multiple-choice quizzes at the end of each session

 

Syllabus

Introduction to Multimodal Interactions and Analysis

  • Overview of multimodal emotion recognition, introduction to the importance of facial, vocal, and body features. Theory on multimodal analysis and integration. Reading materials on multimodal signal processing and emotion recognition.

Session 1: OpenFace for Facial Feature Extraction

  • Introduction to OpenFace, installation and setup. Tutorials on extracting facial landmarks, gaze direction, head pose, and Action Units. Discussion of strengths/limitations. Hands-on exercises with sample datasets.

Session 2: OpenSmile for Vocal Feature Extraction

  • Overview of acoustic features relevant to emotion (pitch, energy, spectral features). Introduction to OpenSmile configuration files and feature sets (e.g., eGeMAPS). Tutorial on extracting vocal features from speech samples. Lab: compare vocal features across different emotional speech datasets.

Session 3: OpenPose for Body Movement Analysis 

  • Introduction to body pose estimation and gesture analysis. Hands-on tutorial with OpenPose to extract skeletal keypoints. Applications in emotion and behaviour recognition. Lab: Analyse differences in body posture/movement in contrasting emotional states.

Session 4: Combining Multimodal Features

  • Strategies for synchronising and integrating features across modalities. Data preprocessing, alignment of timestamps, feature selection. Tutorial: building a multimodal dataset from facial, vocal, and body data.

Session 5: Data Analysis and Emotion Recognition

  • Introduction to statistical analysis and machine learning approaches for multimodal data. Tutorials in Python/R for data exploration, feature analysis, and classification. Case study: train a simple model to recognise emotions using multimodal features.

 

Instructor

Micol Spitale
Politecnico di Milano

About Instructor

Federico

2 Courses

Not Enrolled