• Home
  • DTx
  • Automated emotion recognition via video-based semantic embeddings

IntroductionAutomated emotion recognition systems often rely on acted datasets and categorical models that miss the nuance of spontaneous affect.MethodsThis work assembled a large corpus of authentic facial emotion expressions from naturalistic outpatient psychotherapy sessions, annotated with free-text descriptions by human labelers. These descriptions were embedded in a 768-dimensional semantic space using a fine-tuned German Sentence-BERT model. Transformer, BILSTM, and deep neural network architectures were trained to map facial landmark features to continuous emotion embeddings.ResultsLeave-one-out cross-validation showed model predictions closely matched human annotations with a mean z-score of 1.97. External evaluation against acted datasets (RAVDESS) confirmed strong recognition of joy, sadness, and fear.DiscussionTo enhance interpretability, a back-translation mechanism using cosine similarity was implemented and visualized with radar charts. All components were integrated into AFFECT, an open-source pipeline for analyzing emotional expressions in everyday video recordings.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844