arXiv:2507.10610v3 Announce Type: replace-cross Abstract: Graphical user interface (GUI) agents built on multimodal large language models (MLLMs) have recently demonstrated strong decision-making abilities in screen-based interaction tasks. However, they remain highly vulnerable to pop-up-based environmental injection attacks, where malicious visual elements divert model attention and lead to unsafe or incorrect actions. Existing defense methods either […]
StateX: Enhancing RNN Recall via Post-training State Expansion
arXiv:2509.22630v2 Announce Type: replace-cross Abstract: Recurrent neural networks (RNNs), such as linear attention and state-space models, have gained popularity due to their constant per-token complexity when processing long contexts. However, these recurrent models struggle with tasks that require accurate recall of contextual information from long contexts, because all contextual information is compressed into a fixed-size […]
One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors
arXiv:2510.25241v2 Announce Type: replace-cross Abstract: Whole-body humanoid motion represents a fundamental challenge in robotics, requiring balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a data-efficient adaptation approach that […]
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders
arXiv:2601.02978v2 Announce Type: replace-cross Abstract: Recent work in Mechanistic Interpretability (MI) has enabled the identification and intervention of internal features in Large Language Models (LLMs). However, a persistent challenge lies in linking such internal features to the reliable control of complex, behavior-level semantic attributes in language generation. In this paper, we propose a Sparse Autoencoder-based […]
Weight space Detection of Backdoors in LoRA Adapters
arXiv:2602.15195v3 Announce Type: replace-cross Abstract: LoRA adapters let users fine-tune large language models (LLMs) efficiently. However, LoRA adapters are shared through open repositories like Hugging Face Hub citephuggingface_hub_docs, making them vulnerable to backdoor attacks. Current detection methods require running the model with test input data — making them impractical for screening thousands of adapters where […]
First-Mover Bias in Gradient Boosting Explanations: Mechanism, Detection, and Resolution
arXiv:2603.22346v2 Announce Type: replace-cross Abstract: We identify first-mover bias — path-dependent concentration of SHAP feature importance from sequential residual fitting in gradient boosting — as a mechanistic contributor to attribution instability under multicollinearity. Scaling up a single model amplifies this effect: a Large Single Model matching our method’s total tree count produces the poorest attribution […]
Safety, Security, and Cognitive Risks in World Models
arXiv:2604.01346v2 Announce Type: replace-cross Abstract: World models – learned internal simulators of environment dynamics – are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. By predicting future states in compressed latent spaces, they enable sample-efficient planning and long-horizon imagination without direct environment interaction. Yet this predictive power introduces a distinctive […]
Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw
arXiv:2604.05589v1 Announce Type: cross Abstract: Agentic Al systems are increasingly deployed as personal assistants and are likely to become a common object of digital investigations. However, little is known about how their internal state and actions can be reconstructed during forensic analysis. Despite growing popularity, systematic forensic approaches for such systems remain largely unexplored. This […]
Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening
arXiv:2604.05620v1 Announce Type: cross Abstract: Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariably […]
SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation
arXiv:2604.05656v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models based on flow matching — such as pi0, pi0.5, and SmolVLA — achieve state-of-the-art generalist robotic manipulation, yet their iterative denoising, typically 10 ODE steps, introduces substantial latency: on a modern GPU, denoising alone accounts for 80% of end-to-end inference time. Naively reducing the step count is […]
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models
arXiv:2604.05755v1 Announce Type: cross Abstract: In today’s software architecture, large language models (LLMs) serve as software architecture co-pilots. However, no benchmark currently exists to evaluate large language models’ actual understanding of cloud-native software architecture. For this reason we present a benchmark called CAKE, which consists of 188 expert-validated questions covering four cognitive levels of Bloom’s […]
EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding
arXiv:2604.05843v1 Announce Type: cross Abstract: Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices, providing critical support for individuals with motor impairments. However, accurate motor imagery (MI) decoding from electroencephalography (EEG) remains challenging due to noise and cross-session variability. This study introduces EEG-MFTNet, a novel deep learning model based on the EEGNet […]