arXiv:2601.18949v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly integrated into software development workflows, yet they often introduce subtle logic or data-misuse errors that differ from human bugs. To study how these two error types interact, we construct Tricky$^2$, a hybrid dataset that augments the existing TrickyBugs corpus of human-written defects with errors […]
AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field
arXiv:2509.18776v2 Announce Type: replace-cross Abstract: Large language models (LLMs), as a novel information technology, are seeing increasing adoption in the Architecture, Engineering, and Construction (AEC) field. They have shown their potential to streamline processes throughout the building lifecycle. However, the robustness and reliability of LLMs in such a specialized and safety-critical domain remain to be […]
AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability
arXiv:2601.19886v1 Announce Type: cross Abstract: The race for artificial intelligence (AI) dominance often prioritizes scale over efficiency. Hyper-scaling is the common industry approach: larger models, more data, and as many computational resources as possible. Using more resources is a simpler path to improved AI performance. Thus, efficiency has been de-emphasized. Consequently, the need for costly […]
Astra: General Interactive World Model with Autoregressive Denoising
arXiv:2512.08931v3 Announce Type: replace-cross Abstract: Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose scenarios and various forms of actions. To bridge this gap, we […]
Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
arXiv:2601.19624v1 Announce Type: cross Abstract: Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift (thus slow recovery), and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We prove that entropy scheduling under […]
Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration
arXiv:2512.03102v3 Announce Type: replace-cross Abstract: In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited or biased information-can be severely misaligned with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero […]
Improving Value-based Process Verifier via Low-Cost Variance Reduction
arXiv:2508.10539v2 Announce Type: replace Abstract: Large language models (LLMs) have achieved remarkable success in a wide range of tasks. However, their reasoning capabilities, particularly in complex domains like mathematics, remain a significant challenge. Value-based process verifiers, which estimate the probability of a partial reasoning chain leading to a correct solution, are a promising approach for […]
Learning Neural Operators from Partial Observations via Latent Autoregressive Modeling
arXiv:2601.15547v2 Announce Type: replace-cross Abstract: Real-world scientific applications frequently encounter incomplete observational data due to sensor limitations, geographic constraints, or measurement costs. Although neural operators significantly advanced PDE solving in terms of computational efficiency and accuracy, their underlying assumption of fully-observed spatial inputs severely restricts applicability in real-world applications. We introduce the first systematic framework […]
M-SGWR: Multiscale Similarity and Geographically Weighted Regression
arXiv:2601.19888v1 Announce Type: cross Abstract: The first law of geography is a cornerstone of spatial analysis, emphasizing that nearby and related locations tend to be more similar, however, defining what constitutes “near” and “related” remains challenging, as different phenomena exhibit distinct spatial patterns. Traditional local regression models, such as Geographically Weighted Regression (GWR) and Multiscale […]
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
arXiv:2506.13474v2 Announce Type: replace-cross Abstract: Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical […]
MSCloudCAM: Multi-Scale Context Adaptation with Convolutional Cross-Attention for Multispectral Cloud Segmentation
arXiv:2510.10802v4 Announce Type: replace-cross Abstract: Clouds remain a major obstacle in optical satellite imaging, limiting accurate environmental and climate analysis. To address the strong spectral variability and the large scale differences among cloud types, we propose MSCloudCAM, a novel multi-scale context adapter network with convolution based cross-attention tailored for multispectral and multi-sensor cloud segmentation. A […]
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
arXiv:2505.06047v2 Announce Type: replace-cross Abstract: Irregular temporal data, characterized by varying recording frequencies, differing observation durations, and missing values, presents significant challenges across fields like mobility, healthcare, and environmental science. Existing research communities often overlook or address these challenges in isolation, leading to fragmented tools and methods. To bridge this gap, we introduce a unified […]