arXiv:2603.17570v1 Announce Type: cross Abstract: Tabular foundation models, specifically Prior-Data Fitted Networks (PFNs), have revolutionized outlier detection (OD) by enabling unsupervised zero-shot adaptation to new datasets without training. However, despite their predictive power, these models typically function as opaque black boxes, outputting scalar outlier scores that lack the operational context required for safety-critical decision-making. Existing […]
Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization
arXiv:2603.17692v1 Announce Type: cross Abstract: For LLM trading agents to be genuinely trustworthy, they must demonstrate understanding of market dynamics rather than exploitation of memorized ticker associations. Building responsible multi-agent systems demands rigorous signal validation: proving that predictions reflect legitimate patterns, not pre-trained recall. We address two sources of spurious performance: memorization bias from ticker-specific […]
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
arXiv:2603.03823v3 Announce Type: replace-cross Abstract: Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations — a process that […]
RangeAD: Fast On-Model Anomaly Detection
arXiv:2603.17795v1 Announce Type: cross Abstract: In practice, machine learning methods commonly require anomaly detection (AD) to filter inputs or detect distributional shifts. Typically, this is implemented by running a separate AD model alongside the primary model. However, this separation ignores the fact that the primary model already encodes substantial information about the target distribution. In […]
FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
arXiv:2603.17555v1 Announce Type: cross Abstract: Diffusion-based image-to-video (I2V) models are increasingly effective, yet they struggle to scale to ultra-high-resolution inputs (e.g., 4K). Generating videos at the model’s native resolution often loses fine-grained structure, whereas high-resolution tiled denoising preserves local detail but breaks global layout consistency. This failure mode is particularly severe in the fresco animation […]
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control
arXiv:2603.17834v1 Announce Type: cross Abstract: Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration schedule that is agnostic to state complexity. This paradigm forces the policy to expend the same computational budget on trivial motions […]
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
arXiv:2603.01771v3 Announce Type: replace-cross Abstract: Neural networks (NNs) often have critical behavioural trade-offs that are set at design time with hyperparameters-such as reward weights in reinforcement learning or quantile targets in regression. Post-deployment, however, user preferences can evolve, making initial settings undesirable, necessitating potentially expensive retraining. To circumvent this, we introduce the task of Hyperparameter […]
Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs
arXiv:2603.17902v1 Announce Type: cross Abstract: Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively […]
CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments
arXiv:2603.17548v1 Announce Type: cross Abstract: Artificial intelligence systems predominantly rely on static data distributions, making them ineffective in dynamic real-world environments, such as cybersecurity, autonomous transportation, or finance, where data shifts frequently. Continual learning offers a potential solution by enabling models to learn from sequential data while retaining prior knowledge. However, a critical and underexplored […]
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
arXiv:2603.18002v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have made impressive progress in connecting vision and language, but they still struggle with spatial understanding and viewpoint-aware reasoning. Recent efforts aim to augment the input representations with geometric cues rather than explicitly teaching models to reason in 3D space. We introduce Loc3R-VLM, a framework […]
interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors
arXiv:2602.11202v2 Announce Type: replace-cross Abstract: Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification increasingly important for ensuring correctness. Existing approaches either verify only the final answer, which misses early errors, or rely on branch-and-verify strategies that explore multiple trajectories at substantially higher compute cost. We introduce interwhen, a single-trajectory […]
ScheduleMe: Multi-Agent Calendar Assistant
arXiv:2509.25693v3 Announce Type: replace Abstract: Recent advancements in LLMs have contributed to the rise of advanced conversational assistants that can assist with user needs through natural language conversation. This paper presents a ScheduleMe, a multi-agent calendar assistant for users to manage google calendar events in natural language. The system uses a graph-structured coordination mechanism where […]