Treating enterprise AI as an operating layer

Treating enterprise AI as an operating layer

There’s a fault line running through enterprise AI, and it’s not the one getting the most attention. The public conversation still tracks foundation models and

Making AI operational in constrained public sector environments

Making AI operational in constrained public sector environments

The AI boom has hit across industries, and public sector organizations are facing pressure to accelerate adoption. At the same time, government institutions face distinct

Why having “humans in the loop” in an AI war is an illusion

The availability of artificial intelligence for use in warfare is at the center of a legal battle between Anthropic and the Pentagon. This debate has

CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling

arXiv:2604.13561v1 Announce Type: cross Abstract: Vision-language models trained with contrastive learning on paired medical images and reports show strong zero-shot diagnostic capabilities, yet the effect

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

arXiv:2604.13630v1 Announce Type: cross Abstract: The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool

A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

April 16, 2026

arXiv:2604.13448v1 Announce Type: cross
Abstract: Human-object interaction (HOI) detection aims to detect interactions between humans and objects in images. While recent advances have improved performance on existing benchmarks, their evaluations mainly focus on overall prediction accuracy and provide limited insight into the underlying causes of model failures. In particular, modern models often struggle in complex scenes involving multiple people and rare interaction combinations. In this work, we present a study to better understand the failure modes of two-stage HOI models, which form the basis of many current HOI detection approaches. Rather than constructing a large-scale benchmark, we instead decompose HOI detection into multiple interpretable perspectives and analyze model behavior across these dimensions to study different types of failure patterns. We curate a subset of images from an existing HOI dataset organized by human-object-interaction configurations (e.g., multi-person interactions and object sharing), and analyze model behavior under these configurations to examine different failure modes. This design allows us to analyze how these HOI models behave under different scene compositions and why their predictions fail. Importantly, high overall benchmark performance does not necessarily reflect robust visual reasoning about human-object relationships. We hope that this study can provide useful insights into the limitations of HOI models and offer observations for future research in this area.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844