arXiv:2605.21299v1 Announce Type: cross Abstract: Humans effortlessly go beyond literal meanings: If you mow the lawn, I will give you fifty dollars, is typically understood as implying that the speaker will pay only if the lawn is mowed, whereas If you are hungry, there is pizza in the oven implies that pizza is available regardless […]
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
arXiv:2605.20201v1 Announce Type: cross Abstract: Recent large language models support inputs of up to 10 million tokens, yet they perform poorly on long-context tasks that require complex reasoning. Such tasks can be solved using only a subset of the input — a proxy context — rather than the full sequence. Despite sharing the same underlying […]
Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models
arXiv:2605.20187v1 Announce Type: cross Abstract: Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a […]
Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues
arXiv:2605.20195v1 Announce Type: cross Abstract: A target-oriented proactive dialogue system is designed to steer conversations toward predefined targets while actively providing suggestions. The core paradigm of such a system is to plan a reasonable dialogue path and subsequently guide language models (e.g., pre-trained or large language models) to generate responses, where dialogue path planning serves […]
Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
arXiv:2605.20270v1 Announce Type: cross Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $alpha$. The operator needs a safety certificate for this deployment’s stream at every round: no pooling across deployments, no waiting for a long-run average. Existing […]
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
arXiv:2605.20284v1 Announce Type: cross Abstract: Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually grounded reasoning for better image understanding. However, LMMs lack domain-specific knowledge, which limits their ability to generate accurate responses in complex industrial scenarios. In this work, we present JUDO, […]
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
arXiv:2605.20241v1 Announce Type: cross Abstract: Prompt-level safety probes for large language models use hidden-state representations to separate safe from unsafe prompts, but strong average detection performance does not explain the geometry of this separation. In particular, it remains unclear how safety evidence is formed across layers, which aspects of that layer-wise geometry support low-false-positive decisions, […]
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
arXiv:2605.20255v1 Announce Type: cross Abstract: Simulation-based testing of self-driving cars (SDCs) typically relies on scripted or simplified pedestrian models that do not capture the heterogeneity and uncertainty of real human crossing behavior. This limits the realism of safety assessments, especially in scenarios involving jaywalking, which is governed by latent personality traits that the vehicle cannot […]
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
arXiv:2605.19503v2 Announce Type: replace-cross Abstract: Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We […]
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
arXiv:2605.21154v1 Announce Type: cross Abstract: Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing […]
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
arXiv:2605.19729v2 Announce Type: replace-cross Abstract: We demonstrate that in knowledge distillation for diffusion models, the teacher network’s highly complex denoising process – stemming from its substantially larger capacity – poses a significant challenge for the student model to faithfully mimic. To address this problem, we propose a coarse-to-fine distillation framework with LInear FiTtingbased distillation (LIFT) […]
Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums
arXiv:2605.21157v1 Announce Type: cross Abstract: In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of […]