arXiv:2507.16214v3 Announce Type: replace-cross Abstract: Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal (ADR) missions targeting tumbling derelict satellites such as ESA’s ENVISAT. This work presents a complete pipeline integrating advanced computer vision techniques with adaptive nonlinear filtering to address this challenge. A Convolutional Neural Network (CNN), enhanced with […]
Hyperagents
arXiv:2603.19461v1 Announce Type: new Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwin G”odel Machine (DGM) demonstrates open-ended self-improvement in coding by repeatedly […]
CARES: Context-Aware Resolution Selector for VLMs
arXiv:2510.19496v2 Announce Type: replace-cross Abstract: Large vision-language models (VLMs) commonly process images at native or high resolution to remain effective across tasks. This inflates visual tokens ofter to 97-99% of total tokens, resulting in high compute and latency, even when low-resolution images would suffice. We introduce emphCARES-a textbfContext-textbfAware textbfResolution textbfSelector, a lightweight preprocessing module that, […]
Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids
arXiv:2603.19473v1 Announce Type: new Abstract: Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental […]
Understanding and Optimizing Multi-Stage AI Inference Pipelines
arXiv:2504.09775v5 Announce Type: replace-cross Abstract: The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms. Modern LLM serving extends beyond traditional prefill-decode workflows, incorporating multi-stage processes such as Retrieval Augmented Generation (RAG), key-value (KV) cache retrieval, dynamic model routing, and multi step reasoning. These stages […]
When both Grounding and not Grounding are Bad — A Partially Grounded Encoding of Planning into SAT (Extended Version)
arXiv:2603.19429v1 Announce Type: new Abstract: Classical planning problems are typically defined using lifted first-order representations, which offer compactness and generality. While most planners ground these representations to simplify reasoning, this can cause an exponential blowup in size. Recent approaches instead operate directly on the lifted level to avoid full grounding. We explore a middle ground […]
World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation
arXiv:2509.19080v2 Announce Type: replace-cross Abstract: Robotic manipulation policies are commonly initialized through imitation learning, but their performance is limited by the scarcity and narrow coverage of expert data. Reinforcement learning can refine polices to alleviate this limitation, yet real-robot training is costly and unsafe, while training in simulators suffers from the sim-to-real gap. Recent advances […]
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
arXiv:2603.19664v1 Announce Type: cross Abstract: The key-value (KV) cache is widely treated as essential state in transformer inference, and a large body of work engineers policies to compress, evict, or approximate its entries. We prove that this state is entirely redundant: keys and values at every layer are deterministic projections of the residual stream, and […]
PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition
arXiv:2603.19565v1 Announce Type: cross Abstract: Event-based pedestrian attribute recognition (PAR) leverages motion cues to enhance RGB cameras in low-light and motion-blur scenarios, enabling more accurate inference of attributes like age and emotion. However, existing two-stream multimodal fusion methods introduce significant computational overhead and neglect the valuable guidance from contextual samples. To address these limitations, this […]
FB-CLIP: Fine-Grained Zero-Shot Anomaly Detection with Foreground-Background Disentanglement
arXiv:2603.19608v1 Announce Type: cross Abstract: Fine-grained anomaly detection is crucial in industrial and medical applications, but labeled anomalies are often scarce, making zero-shot detection challenging. While vision-language models like CLIP offer promising solutions, they struggle with foreground-background feature entanglement and coarse textual semantics. We propose FB-CLIP, a framework that enhances anomaly localization via multi-strategy textual […]
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3
arXiv:2603.19465v1 Announce Type: cross Abstract: We analyze a fixed-point iteration $v leftarrow phi(v)$ arising in the optimization of a regularized nuclear norm objective involving the Hadamard product structure, posed in~citedenisov in the context of an optimization problem over the space of algorithms in private machine learning. We prove that the iteration $v^(k+1) = textdiag((D_v^(k)^1/2 M […]
Inducing Sustained Creativity and Diversity in Large Language Models
arXiv:2603.19519v1 Announce Type: cross Abstract: We address a not-widely-recognized subset of exploratory search, where a user sets out on a typically long “search quest” for the perfect wedding dress, overlooked research topic, killer company idea, etc. The first few outputs of current large language models (LLMs) may be helpful but only as a start, since […]