arXiv:2603.06961v1 Announce Type: cross Abstract: Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and the combinatorial explosion of mode changes. We give a principled analysis of why imitation learning with quadrupeds can be inherently […]
SuperSkillsStack: Agency, Domain Knowledge, Imagination, and Taste in Human-AI Design Education
arXiv:2603.07016v1 Announce Type: cross Abstract: This study examines how students integrate generative artificial intelligence (AI) into design projects through the lens of the SuperSkillsStack framework, which identifies four key human competencies for effective human-AI collaboration: Agency, Domain Knowledge, Imagination, and Taste. As generative AI increasingly transforms creative practice, design education must consider how human capabilities […]
mAVE: A Watermark for Joint Audio-Visual Generation Models
arXiv:2603.07090v1 Announce Type: cross Abstract: As Joint Audio-Visual Generation Models see widespread commercial deployment, embedding watermarks has become essential for protecting vendor copyright and ensuring content provenance. However, existing techniques suffer from an architectural mismatch by treating modalities as decoupled entities, exposing a critical Binding Vulnerability. Adversaries exploit this via Swap Attacks by replacing authentic […]
Spectral Discovery of Continuous Symmetries via Generalized Fourier Transforms
arXiv:2603.07299v1 Announce Type: cross Abstract: Continuous symmetries are fundamental to many scientific and learning problems, yet they are often unknown a priori. Existing symmetry discovery approaches typically search directly in the space of transformation generators or rely on learned augmentation schemes. We propose a fundamentally different perspective based on spectral structure. We introduce a framework […]
Position: LLMs Must Use Functor-Based and RAG-Driven Bias Mitigation for Fairness
arXiv:2603.07368v1 Announce Type: cross Abstract: Biases in large language models (LLMs) often manifest as systematic distortions in associations between demographic attributes and professional or social roles, reinforcing harmful stereotypes across gender, ethnicity, and geography. This position paper advocates for addressing demographic and gender biases in LLMs through a dual-pronged methodology, integrating category-theoretic transformations and retrieval-augmented […]
Machine Learning for the Internet of Underwater Things: From Fundamentals to Implementation
arXiv:2603.07413v1 Announce Type: cross Abstract: The Internet of Underwater Things (IoUT) is becoming a critical infrastructure for ocean observation, marine resource management, and climate science. Its development is hindered by severe acoustic attenuation, propagation delays far exceeding those of terrestrial wireless systems, strict energy constraints, and dynamic topologies shaped by ocean currents. Machine learning (ML) […]
The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
arXiv:2603.07461v1 Announce Type: cross Abstract: Standard transformers entangle all computation in a single residual stream, obscuring which components perform which functions. We introduce the Dual-Stream Transformer, which decomposes the residual stream into two functionally distinct components: a token stream updated by attention and a context stream updated by feed-forward networks. Information flow between attention heads […]
Targeted Speaker Poisoning Framework in Zero-Shot Text-to-Speech
arXiv:2603.07551v1 Announce Type: cross Abstract: Zero-shot Text-to-Speech (TTS) voice cloning poses severe privacy risks, demanding the removal of specific speaker identities from trained TTS models. Conventional machine unlearning is insufficient in this context, as zero-shot TTS can dynamically reconstruct voices from just reference prompts. We formalize this task as Speech Generation Speaker Poisoning (SGSP), in […]
ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition
arXiv:2603.03359v2 Announce Type: replace-cross Abstract: ASR systems exhibit persistent performance disparities across accents, but whether these gaps reflect superficial biases or deep structural vulnerabilities remains unclear. We introduce ACES, a three-stage audit that extracts accent-discriminative subspaces from ASR representations, constrains adversarial attacks to them, and tests whether removing them improves fairness. On Wav2Vec2-base with seven […]
DECADE: A Temporally-Consistent Unsupervised Diffusion Model for Enhanced Rb-82 Dynamic Cardiac PET Image Denoising
arXiv:2603.07759v1 Announce Type: cross Abstract: Rb-82 dynamic cardiac PET imaging is widely used for the clinical diagnosis of coronary artery disease (CAD), but its short half-life results in high noise levels that degrade dynamic frame quality and parametric imaging. The lack of paired clean-noisy training data, rapid tracer kinetics, and frame-dependent noise variations further limit […]
Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations
arXiv:2603.08317v1 Announce Type: cross Abstract: Humans consistently outperform state-of-the-art AI models in action recognition, particularly in challenging real-world conditions involving low resolution, occlusion, and visual clutter. Understanding the sources of this performance gap is essential for developing more robust and human-aligned models. In this paper, we present a large-scale human-AI comparative study of egocentric action […]
AutoFigure-Edit: Generating Editable Scientific Illustration
arXiv:2603.06674v1 Announce Type: cross Abstract: High-quality scientific illustrations are essential for communicating complex scientific and technical concepts, yet existing automated systems remain limited in editability, stylistic controllability, and efficiency. We present AutoFigure-Edit, an end-to-end system that generates fully editable scientific illustrations from long-form scientific text while enabling flexible style adaptation through user-provided reference images. By […]