arXiv:2605.13527v3 Announce Type: replace Abstract: Reusable skills have become a core substrate for improving agent capabilities, yet most existing skill packages encode reusable behavior primarily as textual prompts, executable code, or learned routines. For visual agents, however, procedural knowledge is inherently multimodal: reuse depends not only on what operation to perform, but also on recognizing […]
BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting
arXiv:2605.27044v2 Announce Type: replace Abstract: Early battery degradation trajectory forecasting (BDTF), which predicts the full-life state-of-health trajectory from early operational data, is critical for battery optimization, manufacturing, and deployment. Battery degradation data exhibit two key characteristics. First, degradation data present a multi-level structure, including regularities shared within aging conditions and trajectory patterns shared across batteries. […]
Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms
arXiv:2503.07154v3 Announce Type: replace-cross Abstract: Generative pre-training is often framed through a false dichotomy between autoregressive models for discrete signals and diffusion models for continuous signals. We argue that the dichotomy is false because it conflates model family, data representation, training objective, and inference procedure. Autoregression is an inference procedure that expands a sequence through […]
Truth, Trust, and Trouble: Medical AI on the Edge
arXiv:2507.02983v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) hold significant promise for transforming digital health by enabling automated medical question answering. However, ensuring these models meet critical industry standards for factual accuracy, usefulness, and safety remains a challenge, especially for open-source solutions. We present a rigorous benchmarking framework using a dataset of over 1,000 […]
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
arXiv:2509.25773v3 Announce Type: replace-cross Abstract: AI models capable of comprehending humor hold real-world promise — for example, enhancing engagement in human-machine interactions. To gauge and diagnose the capacity of multimodal large language models (MLLMs) for humor understanding, we introduce v-HUB, a novel video humor understanding benchmark. v-HUB comprises a curated collection of non-verbal short videos, […]
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
arXiv:2510.14904v4 Announce Type: replace-cross Abstract: Dense Video Object Captioning (DVOC) is the task of jointly detecting, tracking, and captioning object trajectories in a video, requiring the ability to understand spatio-temporal details and describe them in natural language. Due to the complexity of the task and the high cost associated with manual annotation, previous approaches resort […]
VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
arXiv:2512.10120v2 Announce Type: replace-cross Abstract: General-purpose audio representations aim to map acoustically variable instances of the same event to nearby points, resolving content identity in a zero-shot setting. Unlike supervised classification benchmarks that measure adaptability via parameter updates, we introduce VocSim, a training-free benchmark probing the intrinsic geometric alignment of frozen embeddings, with no parameters […]
DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion
arXiv:2601.09239v5 Announce Type: replace-cross Abstract: Speech tokenizers are a key building block of fully discrete Speech LLMs. Existing tokenizers either prioritize semantic encoding, fuse semantic content with acoustic style inseparably,or achieve incomplete semantic-acoustic disentanglement. To achieve better disentanglement,we propose DSA-Tokenizer,which explicitly disentangles speech into discrete semantic and acoustic tokens via distinct optimization constraints.Specifically,semantic tokens are […]
Flow-Based Generative Modeling for Optimizing Sampling Policies in Compressed Sensing Applications
arXiv:2606.00078v1 Announce Type: cross Abstract: Numerous modern applications in signal processing and medical imaging necessitate acquiring high-dimensional signals under tight resource constraints. Traditional sampling theory suggests that accurate signal reconstruction requires a number of measurements proportional to the signal’s ambient dimension, a requirement often too expensive or impractical. Compressed sensing challenges this notion by demonstrating […]
RuleEdit: Failure-Guided Human-AI Model Editing with Prospective Impact Preview
arXiv:2606.00011v1 Announce Type: cross Abstract: Despite the promise of AI to assist complex decisions, practitioners still lack ways to detect likely failures and inspect the consequences of model edits before committing them. We present RuleEdit, an interactive, rule-guided human-AI model editing system that (i) surfaces likely failures through interpretable mismatch signals from rule tables and […]
Update Opacity: Epistemic Accessibility and Governance Under AI System Change
arXiv:2606.00037v1 Announce Type: cross Abstract: Machine learning models embedded in deployed AI systems are routinely updated to maintain correct functioning over time. Yet such updates can generate update opacity: users may not be able to understand why the same input now yields a different output. We argue that update opacity is best understood as a […]
CEON: Circular Economy Ontology Network
arXiv:2606.02253v1 Announce Type: new Abstract: Increasing the circularity of resource use in our society has been recognized as a path to sustainability, i.e., transitioning into a more circular economy. There are many different circular strategies to do so, such as reusing products and components, refurbishing and remanufacturing used products, or recycling left-over or used materials. […]