Multimodal Emotion Recognition
Signal Processing & Computer Vision
This project involved building a multimodal multi-class classification system designed to identify human emotions by synthesizing disparate data sources. The model processes a combination of Video, Audio, and PPG (Photoplethysmography) signals.
Personal Contribution
My primary focus was the implementation of remote-PPG (rPPG). This technique allowed for the non-contact extraction of pulse signals directly from video streams by detecting subtle changes in skin color.
These extracted physiological signals were then integrated with visual and auditory features to train the final emotion recognition architecture.
Input Modalities
- 01 Video: Spatial and temporal facial feature extraction.
- 02 Audio: Spectral analysis of vocal patterns.
- 03 Remote-PPG: Contactless pulse extraction for physiological monitoring.