arXiv:2509.23040v2 Announce Type: replace-cross Abstract: Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory buffer that is dynamically updated via a linear document scan, also known as the “memorize while reading” methods. While […]
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
arXiv:2503.01544v2 Announce Type: replace-cross Abstract: It is well understood that different neural network architectures are suited to different tasks, but is there always a single best architecture for a given task? We compare the expressive power of transformers, RNNs, and transformers with chain of thought tokens on a simple and natural class of tasks we […]
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
arXiv:2506.11558v4 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have recently been extended to the video domain, enabling sophisticated video-language understanding. However, existing Video LLMs often exhibit limitations in fine-grained temporal reasoning, restricting their ability to precisely attribute responses to specific video moments, especially under constrained supervision. We introduce DaMO, a data-efficient Video LLM explicitly […]
A Message Passing Realization of Expected Free Energy Minimization
arXiv:2508.02197v2 Announce Type: replace Abstract: We present a message passing approach to Expected Free Energy (EFE) minimization on factor graphs, based on the theory introduced in arXiv:2504.14898. By reformulating EFE minimization as Variational Free Energy minimization with epistemic priors, we transform a combinatorial search problem into a tractable inference problem solvable through standard variational techniques. […]
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
arXiv:2601.17367v2 Announce Type: replace-cross Abstract: The quadratic complexity of standard attention mechanisms poses a significant scalability bottleneck for large language models (LLMs) in long-context scenarios. While hybrid attention strategies that combine sparse and full attention within a single model offer a viable solution, they typically employ static computation ratios (i.e., fixed proportions of sparse versus […]
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
arXiv:2601.20829v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning abilities of large language models (LLMs), yet training often stalls as problems become saturated. We identify the core challenge as the poor accessibility of informative failures: learning signals exist but are rarely encountered during standard rollouts. To address this, […]
The Bayesian Geometry of Transformer Attention
arXiv:2512.22471v3 Announce Type: replace-cross Abstract: Transformers often appear to perform Bayesian reasoning in context, but verifying this rigorously has been impossible: natural data lack analytic posteriors, and large models conflate reasoning with memorization. We address this by constructing emphBayesian wind tunnels — controlled environments where the true posterior is known in closed form and memorization […]
Li-ViP3D++: Query-Gated Deformable Camera-LiDAR Fusion for End-to-End Perception and Trajectory Prediction
arXiv:2601.20720v1 Announce Type: cross Abstract: End-to-end perception and trajectory prediction from raw sensor data is one of the key capabilities for autonomous driving. Modular pipelines restrict information flow and can amplify upstream errors. Recent query-based, fully differentiable perception-and-prediction (PnP) models mitigate these issues, yet the complementarity of cameras and LiDAR in the query-space has not […]
SimBench: A Framework for Evaluating and Diagnosing LLM-Based Digital-Twin Generation for Multi-Physics Simulation
arXiv:2408.11987v2 Announce Type: replace Abstract: We introduce SimBench, a benchmark designed to evaluate the proficiency of simulator-oriented LLMs (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark ranks them according to their ability to produce high-quality DTs. We demonstrate this by comparing […]
AI Annotation Orchestration: Evaluating LLM verifiers to Improve the Quality of LLM Annotations in Learning Analytics
arXiv:2511.09785v2 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly used to annotate learning interactions, yet concerns about reliability limit their utility. We test whether verification-oriented orchestration-prompting models to check their own labels (self-verification) or audit one another (cross-verification)-improves qualitative coding of tutoring discourse. Using transcripts from 30 one-to-one math sessions, we compare three […]
An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases
arXiv:2407.10853v4 Announce Type: replace-cross Abstract: Bias and fairness risks in Large Language Models (LLMs) vary substantially across deployment contexts, yet existing approaches lack systematic guidance for selecting appropriate evaluation metrics. We present a decision framework that maps LLM use cases, characterized by a model and population of prompts, to relevant bias and fairness metrics based […]
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection
arXiv:2505.16512v5 Announce Type: replace-cross Abstract: In recent years, the explosive advancement of deepfake technology has posed a critical and escalating threat to public security: diffusion-based digital human generation. Unlike traditional face manipulation methods, such models can generate highly realistic videos with consistency via multimodal control signals. Their flexibility and covertness pose severe challenges to existing […]