arXiv:2511.17053v1 Announce Type: cross Abstract: LVLMs have been shown to perform excellently in image-level tasks such as VQA and caption. However, in many instance-level tasks, such as visual grounding and object detection, LVLMs still show performance gaps compared to previous expert models. Meanwhile, although pedestrian tracking is a classical task, there have been a number […]
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability
arXiv:2511.17131v1 Announce Type: cross Abstract: While current Computer Use Agent (CUA) benchmarks measure task completion effectively, they provide limited assessment of enterprise deployment readiness, emphasizing functional correctness over the operational reliability required for production systems. We present UI-CUBE (UiPath Computer Use BEnchmark), a systematic benchmark comprising 226 tasks across two difficulty tiers designed to expose […]
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
arXiv:2511.17170v1 Announce Type: cross Abstract: Large Language Models (LLMs) often produce fluent but factually incorrect responses, a phenomenon known as hallucination. Abstention, where the model chooses not to answer and instead outputs phrases such as “I don’t know”, is a common safeguard. However, existing abstention methods typically rely on post-generation signals, such as generation variations […]
Algorithmic design and implementation considerations of deep MPC
arXiv:2511.17233v1 Announce Type: cross Abstract: Deep Model Predictive Control (Deep MPC) is an evolving field that integrates model predictive control and deep learning. This manuscript is focused on a particular approach, which employs deep neural network in the loop with MPC. This class of approaches distributes control authority between a neural network and an MPC […]
Concept-Based Interpretability for Toxicity Detection
arXiv:2511.16689v1 Announce Type: cross Abstract: The rise of social networks has not only facilitated communication but also allowed the spread of harmful content. Although significant advances have been made in detecting toxic language in textual data, the exploration of concept-based explanations in toxicity detection remains limited. In this study, we leverage various subtype attributes present […]
MoleProLink-RL: geometric transport for domain-policy reinforcement learning in drug-target interaction prediction
npj Digital Medicine, Published online: 24 November 2025; doi:10.1038/s41746-025-02158-0 MoleProLink-RL: geometric transport for domain-policy reinforcement learning in drug-target interaction prediction
On the public dissemination and open sourcing of ultrasound resources, datasets and deep learning models
npj Digital Medicine, Published online: 24 November 2025; doi:10.1038/s41746-025-02162-4 On the public dissemination and open sourcing of ultrasound resources, datasets and deep learning models
A randomized controlled trial of a trauma-informed smartphone application in reducing firefighters’ mental health symptoms
npj Digital Medicine, Published online: 24 November 2025; doi:10.1038/s41746-025-02092-1 A randomized controlled trial of a trauma-informed smartphone application in reducing firefighters’ mental health symptoms
Multimodal analysis of whole slide images in colorectal cancer
npj Digital Medicine, Published online: 24 November 2025; doi:10.1038/s41746-025-02095-y Multimodal analysis of whole slide images in colorectal cancer