HTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTML

arXiv:2605.26807v1 Announce Type: cross Abstract: LLMs can now produce full HTML pages, but many of those pages are only superficially correct: they render once, then fail under scroll, hover, click, resize, or gameplay. Evaluation from screenshots can miss these failures, and filtering discards many pages that are still repairable. We introduce HTMLCure, a browser experience […]

Alignment Makes Language Models Normative, Not Descriptive

arXiv:2603.17218v2 Announce Type: replace-cross Abstract: Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games – bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base […]

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

arXiv:2605.26368v1 Announce Type: cross Abstract: Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full […]

TAMP-OS: An Open-Source Workflow for Tactile 3D-Printable Lithographs

arXiv:2603.16801v2 Announce Type: replace-cross Abstract: Describe an animal without using the verb look. Can you effectively provide an alternative method for interpreting complex microscopy images while preserving the length scale? The world is filled with features too small for our eyes to see: the setae on a gecko’s feet, the cuticles covering a rat’s whisker, […]

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

arXiv:2605.08146v3 Announce Type: replace-cross Abstract: Multi-model learning has attracted great attention in visual-text tasks. However, visual-tabular data, which plays a pivotal role in high-stakes domains like healthcare and industry, remains underexplored. In this paper, we introduce textitVT-Bench, the first unified benchmark for standardizing vision-tabular discriminative prediction and generative reasoning tasks. VT-Bench aggregates 14 datasets across […]

Workflow Closure Is Not Scientific Closure in Auto-Research Systems

arXiv:2605.26200v1 Announce Type: cross Abstract: This paper argues that workflow closure is not scientific closure in auto-research systems. Current systems can increasingly complete research-like loops internally, moving from idea generation to experiment execution, writing, and self-evaluation. That achievement is real, but it does not by itself give the resulting outputs scientific standing. We argue that […]

Less is More: Early Stopping Rollout for On-Policy Distillation

arXiv:2605.27028v1 Announce Type: cross Abstract: On-policy distillation has recently emerged as a promising alternative to standard sequence-level imitation, training a student by scoring its own rollouts with a teacher model. However, we observe “Off-policy Teacher Decay” problem in this paradigm: for the later tokens, with student’s earlier trajectory as context that is off-policy to the […]

Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models

arXiv:2605.27020v1 Announce Type: cross Abstract: The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infringements involving human-created data. Membership inference attacks (MIAs) have emerged as a promising tool for identifying unauthorized data usage during model training. Existing methods typically assess the ability of model to denoise perturbed […]

Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR

arXiv:2603.20020v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization issue in multi-layer feature fusion. Skip pathways introduce direct back-propagation paths from high-level semantic objectives to early visual layers. This mechanism overwrites low-level […]

Demystifying Video Reasoning

arXiv:2603.16870v2 Announce Type: replace-cross Abstract: Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, we challenge this assumption and uncover a fundamentally different mechanism. We […]

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

arXiv:2604.06550v2 Announce Type: replace-cross Abstract: OpenClaw’s ClawHub marketplace hosts tens of thousands of community-contributed agent skills (49,592 in our 2026-04-04 snapshot), and recent audits report that 13-26% contain security vulnerabilities. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural-language SKILL.md instructions that hide prompt injection and social engineering. Neither approach covers both […]

HiSpec: Hierarchical Speculative Decoding for LLMs

arXiv:2510.01336v2 Announce Type: replace-cross Abstract: Speculative decoding accelerates LLM inference by using a smaller draft model to speculate tokens that a larger target model verifies. Verification is often the bottleneck (e.g. verification is $4times$ slower than token generation when a 3B model speculates for a 70B target model), but most prior works focus only on […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844