It’s About Time: The Temporal and Modal Dynamics of Copilot Usage

arXiv:2512.11879v1 Announce Type: cross Abstract: We analyze 37.5 million deidentified conversations with Microsoft’s Copilot between January and September 2025. Unlike prior analyses of AI usage,

Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing

arXiv:2512.13109v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong performance on a variety of natural language processing (NLP) tasks. However, they often

Protecting Bystander Privacy via Selective Hearing in Audio LLMs

arXiv:2512.06380v2 Announce Type: replace-cross Abstract: Audio Large language models (LLMs) are increasingly deployed in the real world, where they inevitably capture speech from unintended nearby

Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM

arXiv:2512.12868v1 Announce Type: cross Abstract: Large language models (LLMs) excel on multiple-choice clinical diagnosis benchmarks, yet it is unclear how much of this performance reflects

Social welfare optimisation in well-mixed and structured populations

arXiv:2512.07453v2 Announce Type: replace-cross Abstract: Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

December 9, 2025

arXiv:2412.15701v5 Announce Type: replace
Abstract: While the advancement of large language models has spurred the development of AI agents to automate tasks, numerous use cases inherently require agents to collaborate with humans due to humans’ latent preferences, domain expertise, or the need for control. To facilitate the study of human-agent collaboration, we introduce Collaborative Gym (Co-Gym), an open framework for developing and evaluating collaborative agents that engage in bidirectional communication with humans while interacting with task environments. We describe how the framework enables the implementation of new task environments and coordination between humans and agents through a flexible, non-turn-taking interaction paradigm, along with an evaluation suite that assesses both collaboration outcomes and processes. Our framework provides both a simulated condition with a reliable user simulator and a real-world condition with an interactive web application. Initial benchmark experiments across three representative tasks — creating travel plans, writing related work sections, and analyzing tabular data — demonstrate the benefits of human-agent collaboration: The best-performing collaborative agents consistently outperform their fully autonomous counterparts in task performance, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users. Despite these improvements, our evaluation reveals persistent limitations in current language models and agents, with communication and situational awareness failures observed in 65% and 40% of cases in the real condition, respectively. Released under the permissive MIT license, Co-Gym supports the addition of new task environments and can be used to develop collaborative agent applications, while its evaluation suite enables assessment and improvement of collaborative agents.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844