arXiv:2606.05130v1 Announce Type: cross Abstract: Individual-level mobility prediction is central to urban simulation, transportation planning, and policy analysis. Supervised sequence models achieve strong accuracy but require task-specific training and offer limited decision-level transparency. Recent LLM-based methods improve interpretability, yet mostly rely on static prompts and single-pass inference, limiting their ability to seek additional evidence when […]
From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents
arXiv:2606.04967v1 Announce Type: cross Abstract: AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We […]
SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems
arXiv:2605.10246v2 Announce Type: replace Abstract: AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only […]
KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
arXiv:2509.15676v2 Announce Type: replace-cross Abstract: In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to […]
AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression
arXiv:2606.04930v1 Announce Type: cross Abstract: Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically changing nonlinear patterns and utilizing them for downstream tasks under strict time constraints is nontrivial. To bridge the gap […]
Efficient Adversarial Attacks on High-dimensional Offline Bandits
arXiv:2602.01658v2 Announce Type: replace-cross Abstract: Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top-performing candidates without exhaustive comparisons. These methods typically rely on a reward model, often distributed with public weights on platforms such as Hugging Face, to […]
Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack
arXiv:2606.03323v2 Announce Type: replace-cross Abstract: The rise of LLM-as-a-Service and other confidential cloud workloads demands cryptographic proof that user data is processed in a trusted, untampered environment. Existing solutions, notably Confidential Containers (CoCo), enforce a strict “one Pod per VM” model that attests only the Guest OS stack, leaving container-level identity unverified and incurring prohibitive […]
Vectorized Online POMDP Planning
arXiv:2510.27191v5 Announce Type: replace-cross Abstract: Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization […]
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
arXiv:2606.04923v1 Announce Type: cross Abstract: Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with […]
Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics
arXiv:2602.12643v2 Announce Type: replace-cross Abstract: We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the efficiency of model-free methods with the representational strengths of model-based approaches, without incurring planning overhead. By embedding state-action pairs into a latent space in which the true value function is approximately linear, our method supports a […]
Reinforcement Learning from Cross-domain Videos with Video Prediction Model
arXiv:2606.03201v2 Announce Type: replace-cross Abstract: Reinforcement learning from expert videos across visually distinct domains is challenging due to the absence of reward signals and the presence of domain gaps. We introduce XIPER (Cross-domain Video Prediction Reward), a reward model for learning from expert videos collected in a visually different domain, where the agent’s appearance differs […]
MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video
arXiv:2605.00242v2 Announce Type: replace-cross Abstract: Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing […]