arXiv:2604.15109v1 Announce Type: cross Abstract: Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent approaches have achieved strong performance by restricting LLMs to produce short or constrained answer sets, many real-world applications require long-form and free-form text generation. A key difficulty in this setting […]
Efficient Search of Implantable Adaptive Cells for Medical Image Segmentation
arXiv:2604.14849v1 Announce Type: cross Abstract: Purpose: Adaptive skip modules can improve medical image segmentation, but searching for them is computationally costly. Implantable Adaptive Cells (IACs) are compact NAS modules inserted into U-Net skip connections, reducing the search space compared with full-network NAS. However, the original IAC framework still requires a 200-epoch differentiable search for each […]
Demonstration of Pneuma-Seeker: Agentic System for Reifying and Fulfilling Information Needs on Tabular Data
arXiv:2604.14422v1 Announce Type: new Abstract: Data analysts working with relational data often start with vague or underspecified questions and refine them iteratively as they explore the data. To support this iterative process, we demonstrate Pneuma-Seeker, a system that reifies a user’s information need as explicit, inspectable relational specifications, enabling iterative refinement of the information need, […]
Listen, Correct, and Feed Back: Spoken Pedagogical Feedback Generation
arXiv:2604.14177v1 Announce Type: cross Abstract: Grammatical error correction (GEC) and explanation (GEE) have made rapid progress, but real teaching scenarios also require emphlearner-friendly pedagogical feedback that is actionable, level-appropriate, and encouraging. We introduce textbfSPFG (textbfSpoken textbfPedagogical textbfFeedback textbfGeneration), a dataset built based on the Speak & Improve Challenge 2025 corpus, pairing fluency-oriented transcriptions with GEC […]
Why Do Vision Language Models Struggle To Recognize Human Emotions?
arXiv:2604.15280v1 Announce Type: cross Abstract: Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even the most sophisticated contemporary […]
HARNESS: Lightweight Distilled Arabic Speech Foundation Models
arXiv:2604.14186v1 Announce Type: cross Abstract: Large self-supervised speech (SSL) models achieve strong downstream performance, but their size limits deployment in resource-constrained settings. We present HArnESS, an Arabic-centric self-supervised speech model family trained from scratch with iterative self-distillation, together with lightweight student variants that offer strong accuracy-efficiency trade-offs on Automatic Speech Recognition (ASR), Dialect Identification (DID), […]
Geometric Routing Enables Causal Expert Control in Mixture of Experts
arXiv:2604.14434v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) models scale parameters while fixing active computation per token, but the specialization of individual experts remains opaque. In a companion paper we showed that routing topology is quality-neutral: five structurally different configurations converge to statistically equivalent language modeling quality. Here we show that expert identity is nonetheless […]
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
arXiv:2604.14198v1 Announce Type: cross Abstract: Domain reweighting can improve sample efficiency and downstream generalization, but data-mixture optimization for multimodal midtraining remains largely unexplored. Current multimodal training recipes tune mixtures along a single dimension, typically data format or task type. We introduce MixAtlas, a method that produces benchmark-targeted data recipes that can be inspected, adapted, and […]
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
arXiv:2510.24284v3 Announce Type: replace Abstract: Large Language Models (LLMs) increasingly rely on external tools to perform complex, realistic tasks, yet their ability to utilize the rapidly expanding Model Contextual Protocol (MCP) ecosystem remains limited. Existing MCP research covers few servers, depends on costly manual curation, and lacks training support, hindering progress toward real-world deployment. To […]
Ollivier-Ricci Curvature of Riemannian Manifolds and Directed Graphs with Applications to Graph Neural Networks
arXiv:2604.14211v1 Announce Type: cross Abstract: This thesis is an exposition of Ollivier-Ricci Curvature of metric spaces as introduced by Yann Ollivier, which is based upon the 1-Wasserstein Distance and optimal transport theory. We present some of the major results and proofs that connect Ollivier-Ricci curvature with classical Ricci curvature of Riemannian manifolds, including extensions of […]
On Tackling Complex Tasks with Reward Machines and Signal Temporal Logics
arXiv:2604.14440v1 Announce Type: new Abstract: We propose a Reinforcement Learning (RL) based control design framework for handling complex tasks. The approach extends the concept of Reward Machines (RM) with Signal Temporal Logic (STL) formulas that can be used for event generation. The use of STL allows not only a more efficient representation of rewards for […]
MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes
arXiv:2604.14218v1 Announce Type: cross Abstract: Hate speech detection in Devanagari-scripted social media memes presents compounded challenges: multimodal content structure, script-specific linguistic complexity, and extreme data scarcity in low-resource settings. This paper presents our system for the CHiPSAL 2026 shared task, addressing both Subtask A (binary hate speech detection) and Subtask B (three-class sentiment classification: positive, […]