arXiv:2510.06687v3 Announce Type: replace-cross Abstract: Semantic segmentation serves as a cornerstone of scene understanding in autonomous driving but continues to face significant challenges under complex conditions such as occlusion. Light field and LiDAR modalities provide complementary visual and spatial cues that are beneficial for robust perception; however, their effective integration is hindered by limited viewpoint […]
Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents
arXiv:2604.10842v2 Announce Type: cross Abstract: LLM-powered coding agents increasingly rely on tool-use protocols such as the Model Context Protocol (MCP) to read and write files on a developer’s workstation. When a write fails – due to content filters, truncation, or an interrupted session – the agent typically receives no structured signal, loses the draft, and […]
From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience
arXiv:2604.11041v1 Announce Type: new Abstract: Semiconductor supply chains face unprecedented resilience challenges amidst global geopolitical turbulence. Conventional Large Language Model (LLM) planners, when confronting such non-stationary “Policy Black Swan” events, frequently suffer from Decision Paralysis or a severe Grounding Gap due to the absence of physical environmental modeling. This paper introduces ReflectiChain, a cognitive agentic […]
Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs
arXiv:2604.11120v2 Announce Type: new Abstract: Personality imbuing customizes LLM behavior, but safety evaluations almost always study prompt-based personas alone. We show this is incomplete: prompting and activation steering expose *different*, architecture-dependent vulnerability profiles, and testing with only one method can miss a model’s dominant failure mode. Across 5,568 judged conditions on four standard models from […]
The Missing Knowledge Layer in Cognitive Architectures for AI Agents
arXiv:2604.11364v1 Announce Type: new Abstract: The two most influential cognitive architecture frameworks for AI agents, CoALA [21] and JEPA [12], both lack an explicit Knowledge layer with its own persistence semantics. This gap produces a category error: systems apply cognitive decay to factual claims, or treat facts and experiences with identical update mechanics. We survey […]
A collaborative agent with two lightweight synergistic models for autonomous crystal materials research
arXiv:2604.11540v1 Announce Type: new Abstract: Current large language models require hundreds of billions of parameters yet struggle with domain-specific reasoning and tool coordination in materials science. Here, we present MatBrain, a lightweight collaborative agent system with two synergistic models specialization for crystal materials research. MatBrain employs a dual-model architecture: Mat-R1 (30B parameters) as the analytical […]
Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models
arXiv:2604.10963v1 Announce Type: new Abstract: Medical image segmentation supports clinical workflows by precisely delineating anatomical structures and lesions. However, medical image datasets medical image datasets suffer from acquisition noise and annotation ambiguity, causing pervasive data uncertainty that substantially undermines model robustness. Existing research focuses primarily on model architectural improvements and predictive reliability estimation, while systematic […]
Diffusion-CAM: Faithful Visual Explanations for dMLLMs
arXiv:2604.11005v1 Announce Type: new Abstract: While diffusion Multimodal Large Language Models (dMLLMs) have recently achieved remarkable strides in multimodal generation, the development of interpretability mechanisms has lagged behind their architectural evolution. Unlike traditional autoregressive models that produce sequential activations, diffusion-based architectures generate tokens via parallel denoising, resulting in smooth, distributed activation patterns across the entire […]
Hodoscope: Unsupervised Monitoring for AI Misbehaviors
arXiv:2604.11072v1 Announce Type: new Abstract: Existing approaches to monitoring AI agents rely on supervised evaluation: human-written rules or LLM-based judges that check for known failure modes. However, novel misbehaviors may fall outside predefined categories entirely and LLM-based judges can be unreliable. To address this, we formulate unsupervised monitoring, drawing an analogy to unsupervised learning. Rather […]
Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model
arXiv:2604.11154v1 Announce Type: new Abstract: New multi-modal large language models (MLLMs) are continuously being trained and deployed, following rapid development cycles. This generative AI frenzy is driving steady increases in energy consumption, greenhouse gas emissions, and a plethora of other environmental impacts linked to datacenter construction and hardware manufacturing. Mitigating the environmental consequences of GenAI […]
PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers
arXiv:2604.11307v1 Announce Type: new Abstract: Leveraging Multi-modal Large Language Models (MLLMs) to accelerate frontier scientific research is promising, yet how to rigorously evaluate such systems remains unclear. Existing benchmarks mainly focus on single-document understanding, whereas real scientific workflows require integrating evidence from multiple papers, including their text, tables, and figures. As a result, multi-modal, multi-document […]
Neutralization titers reveal the structure of polyclonal antibody responses
arXiv:2604.11451v1 Announce Type: new Abstract: The composition of a polyclonal antibody response is hard to measure experimentally but contains vital information about the robustness of immunity. Here, we argue that the statistics of neutralization titers alone can be used to make quantitative predictions about the composition of the response, circumventing challenges arising through sequencing and […]