arXiv:2604.20615v2 Announce Type: replace Abstract: Modern optical microscopes are fully motorised; however, transforming them into truly smart systems requires real-time adjustment of acquisition settings in response to detected objects and dynamic biological events. At the core are classification algorithms that commonly depend on customised softwares and are generally designed for narrowly-defined biological applications. In addition, […]
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
arXiv:2604.25591v1 Announce Type: cross Abstract: Recent audio-aware large language models (ALLMs) have demonstrated strong capabilities across diverse audio understanding and reasoning tasks, but they still frequently produce hallucinated or overly confident outputs. While uncertainty estimation has been extensively studied in text-only LLMs, it remains largely unexplored for ALLMs, where audio-conditioned generation introduces additional challenges such […]
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
arXiv:2503.06778v3 Announce Type: replace-cross Abstract: Event annotation is important for identifying market changes, monitoring breaking news, and understanding sociological trends. Although expert annotators set the gold standards, human coding is expensive and inefficient. Unlike information extraction experiments that focus on single contexts, we evaluate a holistic workflow that removes irrelevant documents, merges documents about the […]
Behavioral Intelligence Platforms: From Event Streams to Autonomous Insight via Probabilistic Journey Graphs, Behavioral Knowledge Extraction, and Grounded Language Generation
arXiv:2604.22762v2 Announce Type: replace-cross Abstract: Contemporary product analytics systems require users to pose explicit queries, such as writing SQL, configuring dashboards, or constructing funnels, before insights can surface. This pull-based paradigm creates a bottleneck: it requires both domain knowledge and technical fluency, and assumes practitioners know in advance which questions to ask. We argue that […]
Beyond I’m Sorry, I Can’t: Dissecting Large Language Model Refusal
arXiv:2509.09708v3 Announce Type: replace-cross Abstract: Refusal on harmful prompts is a key safety behaviour in instruction-tuned large language models (LLMs), yet the internal causes of this behaviour remain poorly understood. We study two public instruction-tuned models, Gemma-2-2B-IT and LLaMA-3.1-8B-IT, using sparse autoencoders (SAEs) trained on residual-stream activations. Given a harmful prompt, we search the SAE […]
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
arXiv:2604.25578v1 Announce Type: cross Abstract: We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5% of the total parameters are activated per input token. This extreme sparsity, combined with upcycling from dense models, enables efficient pre-training on 5T tokens. Our models […]
Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation
arXiv:2511.21517v2 Announce Type: replace-cross Abstract: Unlike text, speech conveys information about the speaker, such as gender, through acoustic cues like pitch. This gives rise to modality-specific bias concerns. For example, in speech translation (ST), when translating from languages with notional gender, such as English, into languages where gender-ambiguous terms referring to the speaker are assigned […]
Optimal Question Selection from a Large Question Bank for Clinical Field Recovery in Conversational Psychiatric Intake
arXiv:2604.22067v2 Announce Type: replace-cross Abstract: Psychiatric intake is a sequential, high-stakes information-gathering process in which clinicians must decide what to ask, in what order, and how to interpret incomplete or ambiguous responses under limited time. Despite growing interest in conversational AI for healthcare, there is still limited infrastructure for conversational AI in this application. Accordingly, […]
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
arXiv:2603.03805v2 Announce Type: replace-cross Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $textbfRDB-PFN$, the first relational foundation […]
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings
arXiv:2604.25568v1 Announce Type: cross Abstract: Accurate bandgap prediction is crucial for semiconductor applications, yet machine learning models trained on computational data often struggle to generalize to experimental bandgap measurements. Challenges related to data fidelity, domain generalization, and model interpretability remain insufficiently addressed in existing evaluation frameworks. To bridge this gap, we introduce RealMat-BaG, a benchmark […]
Assistants, Not Architects: The Role of LLMs in Networked Systems Design
arXiv:2604.25506v1 Announce Type: cross Abstract: Designing the architecture of modern networked systems requires navigating a large, combinatorial space of hardware, systems, and configuration choices with complex cross-layer interactions. Architects must balance competing objectives such as performance, cost, and deployability while satisfying compatibility and resource constraints, often relying on scattered rules-of-thumb drawn from benchmarks, papers, documentation, […]
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
arXiv:2603.09723v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing […]