arXiv:2601.22024v1 Announce Type: cross Abstract: The operation of future 6th-generation (6G) mobile networks will increasingly rely on the ability of deep reinforcement learning (DRL) to optimize network decisions in real-time. DRL yields demonstrated efficacy in various resource allocation problems, such as joint decisions on user scheduling and antenna allocation or simultaneous control of computing resources […]
BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding
arXiv:2601.21148v1 Announce Type: new Abstract: Decoding linguistic information from electroencephalography (EEG) remains challenging due to the brain’s distributed and nonlinear organization. We present BrainStack, a functionally guided neuro-mixture-of-experts (Neuro-MoE) framework that models the brain’s modular functional architecture through anatomically partitioned expert networks. Each functional region is represented by a specialized expert that learns localized neural […]
StepShield: When, Not Whether to Intervene on Rogue Agents
arXiv:2601.22136v1 Announce Type: cross Abstract: Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation at step 8 enables intervention; one that reports it at step 48 provides only forensic value. This distinction is critical, yet current benchmarks cannot measure it. We introduce StepShield, the first […]
Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning
arXiv:2601.21157v1 Announce Type: new Abstract: While Large Language Models excel at semantic tasks, they face a critical bottleneck in financial quantitative reasoning, frequently suffering from “Arithmetic Hallucinations” and a systemic failure mode we term “Cognitive Collapse”. To strictly quantify this phenomenon, we introduce the Cognitive Complexity Benchmark (CCB), a robust evaluation framework grounded in a […]
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble
arXiv:2509.11311v2 Announce Type: replace Abstract: Large language models are increasingly used as proxies for human subjects in social science research, yet external validity requires that synthetic agents faithfully reflect the preferences of target human populations. We introduce *preference reconstruction theory*, a framework that formalizes preference alignment as a representation learning problem: constructing a functional basis […]
Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving
arXiv:2601.21164v1 Announce Type: new Abstract: Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Large Language Models (LLMs) possess strong reasoning skills, their direct application to PGPS is hindered by their inability to process visual diagrams. […]
TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries
arXiv:2512.16453v2 Announce Type: replace Abstract: Large language models (LLMs) offer promising capabilities for interpreting multivariate time-series data, yet their application to real-world battery energy storage system (BESS) operation and maintenance remains largely unexplored. Here, we present TimeSeries2Report (TS2R), a semantic translation framework that converts raw lithium-ion battery operational time-series into structured, semantically enriched reports, enabling […]
FrontierScience: Evaluating AI’s Ability to Perform Expert-Level Scientific Tasks
arXiv:2601.21165v1 Announce Type: new Abstract: We introduce FrontierScience, a benchmark evaluating expert-level scientific reasoning in frontier language models. Recent model progress has nearly saturated existing science benchmarks, which often rely on multiple-choice knowledge questions or already published information. FrontierScience addresses this gap through two complementary tracks: (1) Olympiad, consisting of international olympiad problems at the […]
MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models
arXiv:2601.21181v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) suffer from cross-modal hallucinations, where one modality inappropriately influences generation about another, leading to fabricated output. This exposes a more fundamental deficiency in modality-interaction control. To address this, we propose Modality-Adaptive Decoding (MAD), a training-free method that adaptively weights modality-specific decoding branches based on task […]
Can Large Language Models Capture Video Game Engagement?
arXiv:2502.04379v2 Announce Type: replace-cross Abstract: Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs for successfully predicting continuous affect annotations of videos when prompted by a sequence of text and video frames in […]
Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models
arXiv:2601.21183v1 Announce Type: new Abstract: Reasoning models frequently agree with incorrect user suggestions — a behavior known as sycophancy. However, it is unclear where in the reasoning trace this agreement originates and how strong the commitment is. To localize and quantify this behavior, we introduce emphsycophantic anchors — sentences that causally lock models into user […]
Near-Optimal Online Deployment and Routing for Streaming LLMs
arXiv:2506.17254v2 Announce Type: replace-cross Abstract: The rapid pace at which new large language models (LLMs) appear, and older ones become obsolete, forces providers to manage a streaming inventory under a strict concurrency cap and per-query cost budgets. We cast this as an online decision problem that couples stage-wise deployment (at fixed maintenance windows) with per-query […]