arXiv:2606.04426v1 Announce Type: new Abstract: Cortical circuits operate in a regime of intrinsic chaos, where even tiny changes in input can lead to divergent neural responses. Yet, remarkably, population codes in the brain vary smoothly with sensory stimuli, forming coherent representational manifolds. How can chaotic networks sustain such stable coding? Here, we develop a theoretical […]
Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning
arXiv:2606.04735v1 Announce Type: cross Abstract: Temporal credit assignment is central to both biological and artificial intelligence, yet its interaction with non-linear function approximation is poorly understood. We identify a systematic failure mode in deep reinforcement learning (RL) termed Trace-Mediated Peak Bias (TMPB). At intermediate eligibility trace depths, agents irrationally prefer trajectories with high-magnitude reward “peaks” […]
Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models
arXiv:2605.18931v2 Announce Type: replace-cross Abstract: Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, […]
An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers
arXiv:2606.04752v1 Announce Type: cross Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_textmodel$-dimensional vector per time step. We empirically audit eight input encoders — spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding — […]
Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation
arXiv:2606.04435v1 Announce Type: new Abstract: Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms systematically miss: cascading hallucination, where errors introduced at early pipeline stages propagate and amplify across successive reasoning steps, producing confident but factually incorrect […]
Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control
arXiv:2606.04775v1 Announce Type: cross Abstract: Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering, but existing T2V steering methods remain limited, typically applying coarse, non-anticipative interventions that can lead to […]
BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
arXiv:2605.31483v2 Announce Type: replace-cross Abstract: Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluation framework for Bengali covering four tasks: Generative Question Answering (GQA), Bangla-English Code-Mixed QA, Summarization, and Reasoning. We construct […]
The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
arXiv:2606.04455v1 Announce Type: new Abstract: Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agent Challenge (MAC), an evaluation framework designed to test the capacity of frontier models for autonomous agent development. Specifically, […]
Abduction Prover in Isabelle/HOL
arXiv:2606.04877v1 Announce Type: cross Abstract: Proof assistants based on expressive logics suffer limited automation for proof search, raising the cost of formal verification based on proof assistants. We address this problem by introducing the Abduction Prover for Isabelle/HOL. Given a challenging proof goal, the Abduction Prover constructs a proof script for the goal by identifying […]
Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models
arXiv:2606.04922v1 Announce Type: cross Abstract: Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, ignoring clinically meaningful class relations and yielding unstable decision […]
AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning
arXiv:2606.04484v1 Announce Type: new Abstract: We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client […]
Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance
arXiv:2606.04970v1 Announce Type: cross Abstract: We envision a proactive multi-modal assistant system which gives users real-time step-by-step guidance on a procedural task, autonomously deciding textitwhen to interrupt, and textithow to coach. However, progress is limited by the absence of large-scale, cross-domain benchmarks that reflect realistic conditions, particularly the common case in which users deviate from […]