arXiv:2603.17717v2 Announce Type: replace-cross Abstract: Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component […]
Learning to Play Blackjack: A Curriculum Learning Perspective
arXiv:2604.00076v2 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) agents often struggle with efficiency and performance in complex environments. We propose a novel framework that uses a Large Language Model (LLM) to dynamically generate a curriculum over available actions, enabling the agent to incorporate each action individually. We apply this framework to the game of Blackjack, […]
Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm
arXiv:2604.01941v1 Announce Type: cross Abstract: Image captioning for Early Childhood Education (ECE) is essential for automated activity understanding and educational assessment. However, existing methods face two key challenges. First, the lack of large-scale, domain-specific datasets limits the model’s ability to capture fine-grained semantic concepts unique to ECE scenarios, resulting in generic and imprecise descriptions. Second, […]
SAFE: Stepwise Atomic Feedback for Error correction in Multi-hop Reasoning
arXiv:2604.01993v1 Announce Type: cross Abstract: Multi-hop QA benchmarks frequently reward Large Language Models (LLMs) for spurious correctness, masking ungrounded or flawed reasoning steps. To shift toward rigorous reasoning, we propose SAFE, a dynamic benchmarking framework that replaces the ungrounded Chain-of-Thought (CoT) with a strictly verifiable sequence of grounded entities. Our framework operates across two phases: […]
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
arXiv:2604.02091v1 Announce Type: cross Abstract: Rerankers play a pivotal role in refining retrieval results for Retrieval-Augmented Generation. However, current reranking models are typically optimized on static human annotated relevance labels in isolation, decoupled from the downstream generation process. This isolation leads to a fundamental misalignment: documents identified as topically relevant by information retrieval metrics often […]
Universal Hypernetworks for Arbitrary Models
arXiv:2604.02215v1 Announce Type: cross Abstract: Conventional hypernetworks are typically engineered around a specific base-model parameterization, so changing the target architecture often entails redesigning the hypernetwork and retraining it from scratch. We introduce the emphUniversal Hypernetwork (UHN), a fixed-architecture generator that predicts weights from deterministic parameter, architecture, and task descriptors. This descriptor-based formulation decouples the generator […]
Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
arXiv:2604.02322v1 Announce Type: cross Abstract: Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex training pipelines. We introduce Batched Contextual Reinforcement, a minimalist, single-stage training […]
Cardiac-Phase-Dependent Spin Coherence as a Probe of Boundary Covariance Geometry in Neural Tissue
arXiv:2505.22680v2 Announce Type: replace Abstract: A recently proposed geometric framework predicts that the transition from distributed belief to committed action involves a metric regime change, culminating in a boundary regime where cross-mode structure becomes algebraically necessary for continued state-space compression. This paper examines whether reported magnetic resonance measurements of proton spins in neural tissue provide […]
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
arXiv:2604.00528v2 Announce Type: replace-cross Abstract: 3D Visual Grounding (3D-VG) aims to localize objects in 3D scenes via natural language descriptions. While recent advancements leveraging Vision-Language Models (VLMs) have explored zero-shot possibilities, they typically suffer from a static workflow relying on preprocessed 3D point clouds, essentially degrading grounding into proposal matching. To bypass this reliance, our […]
CogBias: Measuring and Mitigating Cognitive Bias in Large Language Models
arXiv:2604.01366v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes decision-making contexts. While prior work has shown that LLMs exhibit cognitive biases behaviorally, whether these biases correspond to identifiable internal representations and can be mitigated through targeted intervention remains an open question. We define LLM cognitive bias as systematic, reproducible deviations […]
Predicting LLM Output Length via Entropy-Guided Representations
arXiv:2602.11812v2 Announce Type: replace Abstract: The long-tailed distribution of sequence lengths in LLM serving and reinforcement learning (RL) sampling causes significant computational waste due to excessive padding in batched inference. Existing methods rely on auxiliary models for static length prediction, but they incur high overhead, generalize poorly, and fail in stochastic “one-to-many” sampling scenarios. We […]
RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics
arXiv:2604.01375v1 Announce Type: new Abstract: Rubric-based evaluation is widely used in LLM benchmarks and training pipelines for open-ended, less verifiable tasks. While prior work has demonstrated the effectiveness of rubrics using downstream signals such as reinforcement learning outcomes, there remains no principled way to diagnose rubric quality issues from such aggregated or downstream signals alone. […]