arXiv:2602.21720v2 Announce Type: replace-cross Abstract: Human recursive numeral systems (i.e., counting systems such as English base-10 numerals), like many other grammatical systems, are highly regular. Following prior work that relates cross-linguistic tendencies to biases in learning, we ask whether regular systems are common because regularity facilitates learning. Adopting methods from the Reinforcement Learning literature, we […]
Time Blindness: Why Video-Language Models Can’t See What Humans Can?
arXiv:2505.24867v2 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have made impressive strides in understanding spatio-temporal relationships in videos. However, when spatial information is obscured, these models struggle to capture purely temporal patterns. We introduce $textbfSpookyBench$, a benchmark where information is encoded solely in temporal sequences of noise-like frames, mirroring natural phenomena from […]
Auto-Relational Reasoning
arXiv:2604.26507v1 Announce Type: new Abstract: Background & Objectives: In the last decade, Machine learning research has grown rapidly, but large models are reaching their soft limits demonstrating diminishing returns and still lack solid reasoning abilities. These limits could be surpassed through synergistic combination of Machine Learning scalability and rigid reasoning. Methods: In this work, we […]
A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
arXiv:2510.08049v3 Announce Type: replace-cross Abstract: Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through […]
A Comparative Analysis on the Performance of Upper Confidence Bound Algorithms in Adaptive Deep Neural Networks
arXiv:2604.24810v2 Announce Type: replace-cross Abstract: Edge computing environments impose strict constraints on energy consumption and latency, making the deployment of deep neural networks a significant challenge. Therefore, smart and adaptive inference strategies that dynamically balance computational cost or latency with predictive accuracy are critical in edge computing scenarios. In this work, we build on Adaptive […]
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
arXiv:2602.23163v3 Announce Type: replace Abstract: Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the […]
Text-Utilization for Encoder-dominated Speech Recognition Models
arXiv:2604.26514v1 Announce Type: cross Abstract: This paper investigates efficient methods for utilizing text-only data to improve speech recognition, focusing on encoder-dominated models that facilitate faster recognition. We provide a comprehensive comparison of techniques to integrate text-only data, including modality matching and dynamic downsampling to reach text-level representations within the encoder. Our experiments on the LibriSpeech […]
Star-Fusion: A Multi-modal Transformer Architecture for Discrete Celestial Orientation via Spherical Topology
arXiv:2604.26582v1 Announce Type: cross Abstract: Reliable celestial attitude determination is a critical requirement for autonomous spacecraft navigation, yet traditional “Lost-in-Space” (LIS) algorithms often suffer from high computational overhead and sensitivity to sensor-induced noise. While deep learning has emerged as a promising alternative, standard regression models are often confounded by the non-Euclidean topology of the celestial […]
STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
arXiv:2604.26422v1 Announce Type: cross Abstract: Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 […]
Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI
arXiv:2604.26382v1 Announce Type: cross Abstract: Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own — what’s still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing efficiency, retrieval relevance, […]
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
arXiv:2604.26498v1 Announce Type: cross Abstract: The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and […]
TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models
arXiv:2604.26553v1 Announce Type: cross Abstract: Large language models (LLMs) demonstrate strong multilingual capabilities, yet often fail to consistently generate responses in the intended language, exhibiting a phenomenon known as language confusion. Prior mitigation approaches based on sequence-level fine-tuning, such as DPO, ORPO, and GRPO, operate at the level of entire responses and can lead to […]