arXiv:2606.08169v1 Announce Type: cross Abstract: Enabling robots to understand and execute tasks from natural language commands while maintaining data efficiency remains challenging. Foundation models such as vision-language-action (VLA) and vision-language models (VLMs) provide intuitive interaction channels but require extensive data; task-parameterized imitation learning achieves data efficiency but lacks natural language grounding. This work bridges this […]
Syll: Open-Source Personal Automation with Cross-Surface Execution
arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet many systems remain tuned to a single interface and offer limited support for user teaching and auditability. We present Syll, an open-source, self-hosted multimodal agent harness that unifies MCP/API tools, CLI execution, and visual GUI […]
Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
arXiv:2606.08417v1 Announce Type: cross Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule […]
OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs
arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-efficient streaming framework designed specifically for audio-visual LLMs. Unlike existing compression methods that treat all tokens […]
SurfDesign: Effective Protein Design on Molecular Surfaces
arXiv:2606.07567v1 Announce Type: new Abstract: Protein function is largely determined by molecular surface geometry and physicochemical complementarity, yet most protein design methods condition only on backbone structure. We introduce SurfDesign, a surface-conditioned protein design framework that models molecular surfaces as continuous geometric manifolds and integrates them with pretrained protein language models. SurfDesign employs surface-based equivariant […]
Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)
arXiv:2606.07608v1 Announce Type: cross Abstract: We present a systematic study of fine-tuning OpenAI’s Whisper large-v3 for Swiss German ASR, using 1,367 hours of broadcast speech paired with Standard German subtitles as weak supervision. Through 16 iterative training runs on an NVIDIA DGX Spark (Grace Blackwell, 128 GB unified memory, up to 1 PFLOP FP4), we […]
Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision
arXiv:2606.09670v1 Announce Type: cross Abstract: Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions – such as consistent object scale, viewpoint, background, illumination, and centered placement – are violated. Those variations that occur render anomaly detection methods unusable […]
NeuroAlign: Hierarchical Multimodal Fusion of Dynamic and Structural Neuroimaging for MCI Analysis
arXiv:2606.07635v1 Announce Type: cross Abstract: Multimodal neuroimaging fusion of functional MRI (fMRI) and diffusion tensor imaging (DTI) provides complementary information for cognitive impairment analysis, but remains challenged by heterogeneous feature spaces and misaligned representations. We propose textitNeuroAlign, a hierarchical framework for structured multimodal fusion. It introduces (1) textitDual-Modal Hierarchical Alignment (DMHA), which models multi-scale dynamic […]
Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion
arXiv:2606.07722v1 Announce Type: new Abstract: This article offers a perspective on the nature of chatbots as genuine conversation partners when discussing problems in relation to their solutions. What can chatbots do and what can’t they do, and how can this be explained? Our argument draws on Aggregation Dynamics, Cognitive Linguistics, Neuropsychology and Psychology. Our argument […]
Simultaneous hyperkinetic movement disorders phenotyping: a cross-cohort pediatric transfer study using routine videos, markerless pose estimation and a tabular foundation model
arXiv:2606.07674v1 Announce Type: cross Abstract: Objective: To develop and externally test a video-based framework for simultaneous detection of hyperkinetic MDs phenomenologies: dystonia, tremor, myoclonus, chorea, athetosis, ballismus, stereotypies, and tics using routine clinical recordings, with explicit testing of external, cross-cohort transfer from adult to pediatric populations. Methods: In this proof-of-concept study, the framework combines markerless […]
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
arXiv:2509.25004v2 Announce Type: replace Abstract: Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static problem set, wasting rollout budget on solved or overly difficult problems. We propose textbfCLPO (Curriculum Learning meets Policy Optimization), […]
EvoCSFL: Surrogate-Assisted Evolutionary Client Selection for Efficient and Robust Federated Learning
arXiv:2606.07702v1 Announce Type: cross Abstract: The heterogeneity of client data and systems makes it difficult to achieve satisfactory convergence speed and robustness in federated learning with random client selection. To address this issue, this paper proposes a surrogate-assisted client evolutionary selection framework for federated learning. In this framework, some typical client selection strategies are first […]