arXiv:2605.06510v1 Announce Type: cross Abstract: Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and […]
Towards Dependable Retrieval-Augmented Generation Using Factual Confidence Prediction
arXiv:2605.05244v1 Announce Type: cross Abstract: Incorporating specific knowledge into large language models via retrieval-augmented generation (RAG) is a widespread technique that fuels many of today’s industry AI applications. A fundamental problem is to assess if the context retrieved by some similarity search provides indeed supporting facts, or instead misguides the generator with irrelevant information. It […]
SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
arXiv:2605.05546v1 Announce Type: new Abstract: Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be grounded in explicit rules. Extending this paradigm to scientific literature is more challenging: the relationships among multi-modal elements within and across documents are […]
Automated Population-Level Audit Assurance via AI-Based Document Intelligence
arXiv:2605.05252v1 Announce Type: cross Abstract: Audit transaction testing validates accuracy and completeness of customer-facing statements against internal systems of record. Traditional manual, sample-based review of unstructured PDF statements is labor-intensive and does not scale to millions of transactions. This paper presents an automated framework for large-scale audit transaction testing using AI-based document intelligence. The solution […]
Recursive Agent Optimization
arXiv:2605.06639v1 Announce Type: cross Abstract: We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via […]
Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review
arXiv:2605.05271v1 Announce Type: cross Abstract: As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critical thinking and depth of reasoning required to assess scientific novelty. One promising direction for […]
Who Prices Cognitive Labor in the Age of Agents? A Position on Compute-Anchored Wages
arXiv:2605.05558v1 Announce Type: new Abstract: A natural intuition about the economics of AI agents is that, because agents can be replicated at near-zero marginal cost, they constitute a labor input in infinitely elastic supply, and therefore drive cognitive-labor wages to zero. We argue this framing is wrong in mechanism but partially correct in conclusion, and […]
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters
arXiv:2605.05331v1 Announce Type: cross Abstract: Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training resolutions, and reliance on adversarial losses prevents stable scaling. ViTok (Hansen-Estruch et al., 2025) found that the compression ratio r […]
Flexible Agent Alignment with Goal Inference from Open-Ended Dialog
arXiv:2508.15119v2 Announce Type: replace Abstract: We introduce Open-Universe Assistance Games (OU-AGs), a formal framework extending assistance games to LLM-based agents. Effective assistance requires reasoning over human preferences that are unbounded, underspecified, and evolving. Current LLM agents struggle in multi-turn interactions and with maintaining accurate models of user intent in collaborative settings. Existing assistance game formulations […]
Making AI Drafts Count: A Quality Threshold in Audio Description Workflows
arXiv:2605.05348v1 Announce Type: cross Abstract: Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. […]
BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models
arXiv:2605.05561v1 Announce Type: new Abstract: Post-training quantization makes large reasoning models practical under tight memory and latency budgets, but it can distort the online signals that drive adaptive test-time compute allocation. Under a fixed cap on the number of newly generated tokens, miscalibrated confidence can lead to harmful early halting: the model may surface a […]
Owen-Shapley Policy Optimization: A Principled RL Algorithm for Generative Search LLMs
arXiv:2601.08403v2 Announce Type: replace Abstract: Large language models are increasingly trained via reinforcement learning for personalized recommendation tasks, but standard methods like GRPO rely on sparse, sequence-level rewards. These obscure which tokens actually contribute to high-quality outputs, creating a credit assignment gap. This gap is especially problematic when models must infer latent user intent from […]