arXiv:2604.10898v1 Announce Type: new Abstract: Large language models (LLMs) have shown great performance on complex reasoning tasks but often require generating long intermediate thoughts before reaching a final answer. During generation, LLMs rely on a key-value (KV) cache for autoregressive decoding. However, the memory footprint of the KV cache grows with output length. Prior work […]
Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation
arXiv:2604.10741v1 Announce Type: cross Abstract: Recent agentic search frameworks enable deep research via iterative planning and retrieval, reducing hallucinations and enhancing factual grounding. However, they remain text-centric, overlooking the multimodal evidence that characterizes real-world expert reports. We introduce a pressing task: multimodal long-form generation. Accordingly, we propose Deep-Reporter, a unified agentic framework for grounded multimodal […]
CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning
arXiv:2604.10973v1 Announce Type: new Abstract: Reasoning over tabular data is a crucial capability for tasks like question answering and fact verification, as it requires models to comprehend both free-form questions and semi-structured tables. However, while methods like Chain-of-Thought (CoT) introduce reasoning chains, purely symbolic methodes are inherently limited by their blindness to holistic visual patterns. […]
Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics
arXiv:2604.11012v1 Announce Type: new Abstract: The quality of text generated by large language models depends critically on the decoding sampling strategy. While mainstream methods such as Top-$k$, Top-$p$, and Min-$p$ achieve a balance between diversity and accuracy through probability-space truncation, they share an inherent limitation: extreme sensitivity to the temperature parameter. Recent logit-space approaches like […]
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
arXiv:2604.10733v1 Announce Type: cross Abstract: Large language models increasingly serve as conversational agents that adopt personas and role-play characters at user request. This capability, while valuable, raises concerns about sycophancy: the tendency to provide responses that validate users rather than prioritize factual accuracy. While prior work has established that sycophancy poses risks to AI safety […]
StableTTA: Training-Free Test-Time Adaptation that Improves Model Accuracy on ImageNet1K to 96%
arXiv:2604.04552v2 Announce Type: replace-cross Abstract: Ensemble methods are widely used to improve predictive performance, but their effectiveness often comes at the cost of increased memory usage and computational complexity. In this paper, we identify a conflict in aggregation strategies that negatively impacts prediction stability. We propose test-time adaptation (StableTTA), a training-free method employs novel image […]
Probabilistic Prediction of Neural Dynamics via Autoregressive Flow Matching
arXiv:2604.11178v1 Announce Type: new Abstract: Forecasting neural activity in response to naturalistic stimuli remains a key challenge for understanding brain dynamics and enabling downstream neurotechnological applications. Here, we introduce a generative forecasting framework for modeling neural dynamics based on autoregressive flow matching (AFM). Building on recent advances in transport-based generative modeling, our approach probabilistically predicts […]
Perceived Importance of Cognitive Skills Among Computing Students in the Era of AI
arXiv:2604.10730v1 Announce Type: cross Abstract: The availability and increasing integration of generative AI tools have transformed computing education. While AI in education presents opportunities, it also raises new concerns about how these powerful know-it-all AI tools, which are becoming widespread, impact cognitive skill development among students. Cognitive skills are essential for academic success and professional […]
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
arXiv:2604.11328v1 Announce Type: new Abstract: Automatic prompt optimization (APO) hinges on the quality of its evaluation signal, yet scoring every prompt candidate on the full training set is prohibitively expensive. Existing methods either fix a single evaluation subset before optimization begins (principled but prompt-agnostic) or adapt it heuristically during optimization (flexible but unstable and lacking […]
Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures
arXiv:2604.03515v2 Announce Type: replace-cross Abstract: LLM-based coding agents can localize bugs, generate patches, and run tests with diminishing human oversight, yet the scaffolding code that surrounds the language model (the control loop, tool definitions, state management, and context strategy) remains poorly understood. Existing surveys classify agents by abstract capabilities (tool use, planning, reflection) that cannot […]
Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning
arXiv:2604.11462v1 Announce Type: new Abstract: Large Language Models (LLMs) struggle with long-horizon tasks due to the “context bottleneck” and the “lost-in-the-middle” phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized […]
Tail-Aware Information-Theoretic Generalization for RLHF and SGLD
arXiv:2604.10727v1 Announce Type: cross Abstract: Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, […]