arXiv:2502.14883v3 Announce Type: replace-cross Abstract: For individuals with blindness or low vision (BLV), navigating complex environments can pose serious risks. Large Vision-Language Models (LVLMs) show promise for generating scene descriptions, but their effectiveness for BLV users remains underexplored. To address this gap, we conducted a user study with eight BLV participants to systematically evaluate preferences […]
RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks
arXiv:2603.11558v3 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulation. However, scaling them to long-horizon tasks remains challenging. Existing pipelines typically separate data collection, policy learning, and deployment, resulting in heavy reliance on manual environment resets and brittle multi-policy execution. We present RoboClaw, an agentic robotics framework that unifies […]
Are Large Vision-Language Models Ready to Guide Blind and Low-Vision Individuals?
arXiv:2510.00766v2 Announce Type: replace-cross Abstract: Large Vision-Language Models (LVLMs) demonstrate a promising direction for assisting individuals with blindness or low-vision (BLV). Yet, measuring their true utility in real-world scenarios is challenging because evaluating whether their descriptions are BLV-informative requires a fundamentally different approach from assessing standard scene descriptions. While the “VLM-as-a-metric” or “LVLM-as-a-judge” paradigm has […]
Internal APIs Are All You Need: Shadow APIs, Shared Discovery, and the Case Against Browser-First Agent Architectures
arXiv:2604.00694v1 Announce Type: cross Abstract: Autonomous agents increasingly interact with the web, yet most websites remain designed for human browsers — a fundamental mismatch that the emerging “Agentic Web” must resolve. Agents must repeatedly browse pages, inspect DOMs, and reverse-engineer callable routes — a process that is slow, brittle, and redundantly repeated across agents. We […]
How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?
arXiv:2602.11618v3 Announce Type: replace-cross Abstract: Chemical Language Models (CLMs) pre-trained on large scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources such as model size, dataset size, and training compute improves both pretraining loss and downstream task performance has not been systematically validated in the chemical […]
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
arXiv:2603.03823v4 Announce Type: replace-cross Abstract: Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations — a process that static, one-shot repair paradigms fail to […]
Experiential Reflective Learning for Self-Improving LLM Agents
arXiv:2603.24639v2 Announce Type: replace-cross Abstract: Recent advances in large language models (LLMs) have enabled the development of autonomous agents capable of complex reasoning and multi-step problem solving. However, these agents struggle to adapt to specialized environments and do not leverage past interactions, approaching each new task from scratch regardless of their accumulated experience. We introduce […]
Procela: Epistemic Governance in Mechanistic Simulations Under Structural Uncertainty
arXiv:2604.00675v1 Announce Type: cross Abstract: Mechanistic simulations typically assume fixed ontologies: variables, causal relationships, and resolution policies are static. This assumption fails when the true causal structure is contested or unidentifiable-as in antimicrobial resistance (AMR) spread, where contact, environmental, and selection ontologies compete. We introduce Procela, a Python framework where variables act as epistemic authorities […]
Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer
arXiv:2604.00785v1 Announce Type: cross Abstract: Pretraining Large Language Models (LLMs) from scratch requires massive amount of compute. Aurora super computer is an ExaScale machine with 127,488 Intel PVC (Ponte Vechio) GPU tiles. In this work, we showcase LLM pretraining on Aurora at the scale of 1000s of GPU tiles. Towards this effort, we developed Optimus, […]
Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning
arXiv:2602.18807v2 Announce Type: replace-cross Abstract: We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students’ written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher […]
Representation Selection via Cross-Model Agreement using Canonical Correlation Analysis
arXiv:2604.00921v1 Announce Type: cross Abstract: Modern vision pipelines increasingly rely on pretrained image encoders whose representations are reused across tasks and models, yet these representations are often overcomplete and model-specific. We propose a simple, training-free method to improve the efficiency of image representations via a post-hoc canonical correlation analysis (CCA) operator. By leveraging the shared […]
Streaming Model Cascades for Semantic SQL
arXiv:2604.00660v1 Announce Type: cross Abstract: Modern data warehouses extend SQL with semantic operators that invoke large language models on each qualifying row, but the per-row inference cost is prohibitive at scale. Model cascades reduce this cost by routing most rows through a fast proxy model and delegating uncertain cases to an expensive oracle. Existing frameworks, […]