arXiv:2606.07649v1 Announce Type: cross Abstract: Long-form video generation requires systematic narrative planning and visual consistency that current short-clip methods cannot provide. Existing methods generate isolated sequences without narrative structure and lack mechanisms for maintaining character and environmental consistency across scenes. We present ViMax, an agentic video generation framework that addresses video creation through coordinated multi-agent […]
When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA
arXiv:2606.08542v1 Announce Type: cross Abstract: Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, fails, and only succeeds after opening the lock. The failed pull reveals a latent precondition (the drawer is locked) that determines the minimal-success action […]
DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression
arXiv:2606.07599v1 Announce Type: cross Abstract: Ordinal Regression (OR) aims to predict target values with inherent order, underpinning critical applications across diverse domains, from recommender systems to computer vision. Though having evolved from naive regression to discretization-based classification and generation, existing paradigms remain fundamentally constrained by quantization artifacts and the lack of global ordinal topological perception. […]
EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets
arXiv:2606.08057v1 Announce Type: cross Abstract: Egocentric RGB-D videos offer a natural source of human dexterous manipulation demonstrations, but existing data is difficult to use for robot learning because object pose, geometry, and contact information are often missing or require pre-scanned object assets. We present EgoAERO, the first framework that learns dexterous manipulation from a single […]
Generative Frontier Planning for Adaptive Peer-Referral Recruitment under Covariate-Dependent Arrivals
arXiv:2606.08360v1 Announce Type: cross Abstract: Peer-referral recruitment systems such as respondent-driven sampling are critical for studying and intervening on hidden populations affected by infectious diseases. To accelerate recruitment, public health agencies must adaptively allocate limited referral resources across multiple rounds, where current decisions shape both the number and the covariates of future recruits. Prior work […]
Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese
arXiv:2606.07853v1 Announce Type: cross Abstract: Large Language Models are transforming the support for clinical decision and their application in real scenarios. Yet, most benchmarks are conducted in English, and cross-lingual evaluation is needed to tackle the language gaps in global access. We introduce ClinicalBr, the first bilingual benchmark for clinical decision built from real Brazilian […]
TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation
arXiv:2606.07688v1 Announce Type: cross Abstract: Generative recommendation formulates next-item prediction as autoregressive generation over semantic ID (SID) sequences derived from users’ historical interactions, making modern recommender systems structurally similar to large language models (LLMs). As privacy and safety concerns grow, these systems increasingly require concept unlearning to remove sensitive or harmful concepts associated with items. […]
MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science
arXiv:2606.07712v1 Announce Type: cross Abstract: Progress in AI-driven crystal materials science has so far been carried by narrow architectures purpose-built for individual tasks — graph neural networks for property prediction, diffusion and flow-matching models for crystal generation — each excelling within its niche yet unable to act as a shared backbone across the full spectrum […]
PRISM: PRior-guided Imagination Sampling in world Models
arXiv:2606.07974v1 Announce Type: cross Abstract: A learned world model provides a powerful physical intuition for evaluating future states. But its effectiveness in continuous control also depends critically on how candidate actions are generated for model-based planning. Rather than solely asking how accurately a model can simulate the future, we ask: which candidate actions are worth […]
A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline
arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and robustness, not implementation details. We present an empirical study of general-purpose coding agents on a fly […]
CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
arXiv:2606.08169v1 Announce Type: cross Abstract: Enabling robots to understand and execute tasks from natural language commands while maintaining data efficiency remains challenging. Foundation models such as vision-language-action (VLA) and vision-language models (VLMs) provide intuitive interaction channels but require extensive data; task-parameterized imitation learning achieves data efficiency but lacks natural language grounding. This work bridges this […]
Syll: Open-Source Personal Automation with Cross-Surface Execution
arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet many systems remain tuned to a single interface and offer limited support for user teaching and auditability. We present Syll, an open-source, self-hosted multimodal agent harness that unifies MCP/API tools, CLI execution, and visual GUI […]