Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding

arXiv:2511.03549v1 Announce Type: cross Abstract: Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models

Visualization Biases MLLM’s Decision Making in Network Data Tasks

arXiv:2511.03617v1 Announce Type: cross Abstract: We evaluate how visualizations can influence the judgment of MLLMs about the presence or absence of bridges in a network.

Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks

arXiv:2511.03328v1 Announce Type: cross Abstract: A recent advancement in Multimodal Large Language Models (MLLMs) research is the emergence of “reasoning MLLMs” that offer explicit control

Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement

arXiv:2511.03421v1 Announce Type: cross Abstract: Elicited performance requirements need to be quantified for compliance in different engineering tasks, e.g., configuration tuning and performance testing. Much

RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring

arXiv:2511.03153v1 Announce Type: cross Abstract: Large Language Models (LLMs) have substantially influenced various software engineering tasks. Indeed, in the case of software refactoring, traditional LLMs

Large language models require a new form of oversight: capability-based monitoring

November 6, 2025

arXiv:2511.03106v1 Announce Type: new
Abstract: The rapid adoption of large language models (LLMs) in healthcare has been accompanied by scrutiny of their oversight. Existing monitoring approaches, inherited from traditional machine learning (ML), are task-based and founded on assumed performance degradation arising from dataset drift. In contrast, with LLMs, inevitable model degradation due to changes in populations compared to the training dataset cannot be assumed, because LLMs were not trained for any specific task in any given population. We therefore propose a new organizing principle guiding generalist LLM monitoring that is scalable and grounded in how these models are developed and used in practice: capability-based monitoring. Capability-based monitoring is motivated by the fact that LLMs are generalist systems whose overlapping internal capabilities are reused across numerous downstream tasks. Instead of evaluating each downstream task independently, this approach organizes monitoring around shared model capabilities, such as summarization, reasoning, translation, or safety guardrails, in order to enable cross-task detection of systemic weaknesses, long-tail errors, and emergent behaviors that task-based monitoring may miss. We describe considerations for developers, organizational leaders, and professional societies for implementing a capability-based monitoring approach. Ultimately, capability-based monitoring will provide a scalable foundation for safe, adaptive, and collaborative monitoring of LLMs and future generalist artificial intelligence models in healthcare.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844