• Home
  • Uncategorized
  • HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

arXiv:2601.19197v1 Announce Type: cross
Abstract: The integration of Large Language Models (LLMs) into recommendation systems has introduced unprecedented capabilities for natural language understanding, explanation generation, and conversational interactions. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that determine the real-world user experience. We introduce framework (textbfHuman-centered textbfEvaluation for textbfLLM-powered recotextbfMmenders), a comprehensive evaluation framework that systematically assesses LLM-powered recommender systems across five human-centered dimensions: textitIntent Alignment, textitExplanation Quality, textitInteraction Naturalness, textitTrust & Transparency, and textitFairness & Diversity. Through extensive experiments involving three state-of-the-art LLM-based recommenders (GPT-4, LLaMA-3.1, and P5) across three domains (movies, books, and restaurants), and rigorous evaluation by 12 domain experts using 847 recommendation scenarios, we demonstrate that framework reveals critical quality dimensions invisible to traditional metrics. Our results show that while GPT-4 achieves superior explanation quality (4.21/5.0) and interaction naturalness (4.35/5.0), it exhibits a significant popularity bias (Gini coefficient 0.73) compared to traditional collaborative filtering (0.58). We release framework as an open-source toolkit to advance human-centered evaluation practices in the recommender systems community.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844