arXiv:2605.11154v1 Announce Type: cross Abstract: Modern astrophysical studies rely heavily on complex data analysis pipelines; however, published descriptions often lack the detail required for computational reproducibility. In this work, we present an information-theoretic framework to quantify how effectively a method can be reconstructed from its written description. By treating algorithmic reconstruction as a probability distribution […]
VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion
arXiv:2604.06779v2 Announce Type: replace Abstract: Sequential Monte Carlo (SMC) samplers for reward-guided diffusion models often suffer from rapid lineage collapse: a few high-reward particles dominate the population within a handful of resampling steps, destroying diversity and degrading sample quality. We propose a variance-decomposition framework for reward-guided diffusion SMC that separates continuation variance $V_t^mathrmcont$ from residual […]
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
arXiv:2605.11167v1 Announce Type: cross Abstract: Existing multi-model and tool-augmented systems communicate by generating text, serializing every exchange through the output vocabulary. Can two pretrained language models instead coordinate through a continuous, concurrent channel? The Bicameral Model couples two frozen language models through a trainable neural interface on their intermediate hidden states. At every generation step, […]
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents
arXiv:2605.11376v1 Announce Type: new Abstract: We propose a personal-LLM exchange (LLM-X), a scalable negotiation-oriented environment that enables direct, structured communication across populations of personal agents (LLMs), each representing an individual user. Unlike existing tool-centric protocols that focus on agent-API interaction, LLM-X introduces a message bus and routing substrate for LLM-to-LLM coordination with guarantees around schema […]
Adversarial SQL Injection Generation with LLM-Based Architectures
arXiv:2605.11188v1 Announce Type: cross Abstract: SQL injection (SQLi) attacks are still one of the serious attacks ranked in the Open Worldwide Application Security Project (OWASP) Top 10 threats. Today, with advances in Artificial Intelligence (AI), especially in Large Language Models (LLMs), an opportunity has been created for automating adversarial attack tests to measure the defense […]
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents
arXiv:2605.08747v2 Announce Type: replace Abstract: Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures–never completing the task, completing it but failing to stop, and reporting success without sufficient evidence–collapse into the same benchmark failure. We introduce VIGIL, […]
Is Monotonic Sampling Necessary in Diffusion Models?
arXiv:2605.11773v1 Announce Type: cross Abstract: Diffusion models generate samples by iteratively denoising a Gaussian prior, traversing a sequence of noise levels that, in every published sampler, decreases monotonically. Six years of intensive work has refined nearly every aspect of this recipe, including the corruption operator, the training objective, the schedule shape, the architecture, and the […]
SkillGen: Verified Inference-Time Agent Skill Synthesis
arXiv:2605.10999v1 Announce Type: cross Abstract: Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output […]
Engineering Robustness into Personal Agents with the AI Workflow Store
arXiv:2605.10907v2 Announce Type: replace-cross Abstract: The dominant paradigm for AI agents is an “on-the-fly” loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts. We argue that this paradigm short-circuits disciplined software engineering (SE) processes — iterative design, rigorous testing, adversarial evaluation, staged deployment, and more — […]
Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures
arXiv:2605.10991v1 Announce Type: cross Abstract: Existing approaches to LLM personalization focus on constructing better personalized models or inputs, while treating inference as a single-shot process. In this work, we study Test-Time Personalization (TTP) along an unexplored axis: scaling inference-time computation by sampling N candidates from a personalized policy model and selecting the best with a […]
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks
arXiv:2605.10977v1 Announce Type: cross Abstract: Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment. However, existing watermarking methods are often vulnerable to semantic-invariant attacks, such as paraphrasing. We propose PASA, a principled, robust, and distortion-free watermarking algorithm that embeds and detects a watermark at the semantic […]
LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection
arXiv:2605.10980v1 Announce Type: cross Abstract: Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high confidence levels, which ensures negligible discrepancy between the marginal and joint distributions. However, the stringent confidence thresholds required to preserve […]