arXiv:2603.16987v1 Announce Type: cross Abstract: Deploying vision-language models (VLMs) in resource-constrained settings demands low latency and high throughput, yet existing compact VLMs often fall short of the inference speedups their smaller parameter counts suggest. To explain this discrepancy, we conduct an empirical end-to-end efficiency analysis and systematically profile inference to identify the dominant bottlenecks. Based […]
Large Reasoning Models Struggle to Transfer Parametric Knowledge Across Scripts
arXiv:2603.17070v1 Announce Type: cross Abstract: In this work, we analyze shortcomings in cross-lingual knowledge transfer in large, modern reasoning LLMs. We demonstrate that the perceived gap in knowledge transfer is primarily a script barrier. First, we conduct an observational data analysis on the performance of thinking models on two datasets with local knowledge from around […]
Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents
arXiv:2603.17150v1 Announce Type: cross Abstract: Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: emphdoes the generated code actually do what the user intended? The gap between informal natural language requirements and precise program behavior — the emphintent gap — has always plagued software engineering, but AI-generated code amplifies […]
A scalable neural bundle map for multiphysics prediction in lithium-ion battery across varying configurations
arXiv:2603.17209v1 Announce Type: cross Abstract: Efficient and accurate prediction of Multiphysics evolution across diverse cell geometries is fundamental to the design, management and safety of lithium-ion batteries. However, existing computational frameworks struggle to capture the coupled electrochemical, thermal, and mechanical dynamics across diverse cell geometries and varying operating conditions. Here, we present a Neural Bundle […]
Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients
arXiv:2603.17234v1 Announce Type: cross Abstract: Surgical co-management (SCM) is an evidence-based model in which hospitalists jointly manage medically complex perioperative patients alongside surgical teams. Despite its clinical and financial value, SCM is limited by the need to manually identify eligible patients. To determine whether SCM triage can be automated, we conducted a prospective, unblinded study […]
Symphony: A Cognitively-Inspired Multi-Agent System for Long-Video Understanding
arXiv:2603.17307v1 Announce Type: cross Abstract: Despite rapid developments and widespread applications of MLLM agents, they still struggle with long-form video understanding (LVU) tasks, which are characterized by high information density and extended temporal spans. Recent research on LVU agents demonstrates that simple task decomposition and collaboration mechanisms are insufficient for long-chain reasoning tasks. Moreover, directly […]
AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
arXiv:2603.17441v1 Announce Type: cross Abstract: GUI grounding is a critical capability for vision-language models (VLMs) that enables automated interaction with graphical user interfaces by locating target elements from natural language instructions. However, grounding on GUI screenshots remains challenging due to high-resolution images, small UI elements, and ambiguous user instructions. In this work, we propose AdaZoom-GUI, […]
KineVLA: Towards Kinematics-Aware Vision-Language-Action Models with Bi-Level Action Decomposition
arXiv:2603.17524v1 Announce Type: cross Abstract: In this paper, we introduce a novel kinematics-rich vision-language-action (VLA) task, in which language commands densely encode diverse kinematic attributes (such as direction, trajectory, orientation, and relative displacement) from initiation through completion, at key moments, unlike existing action instructions that capture kinematics only coarsely or partially, thereby supporting fine-grained and […]
Joint Optimization of Storage and Loading for High-Performance 3D Point Cloud Data Processing
arXiv:2603.16945v1 Announce Type: cross Abstract: With the rapid development of computer vision and deep learning, significant advancements have been made in 3D vision, partic- ularly in autonomous driving, robotic perception, and augmented reality. 3D point cloud data, as a crucial representation of 3D information, has gained widespread attention. However, the vast scale and complexity of […]
PhysQuantAgent: An Inference Pipeline of Mass Estimation for Vision-Language Models
arXiv:2603.16958v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) are increasingly applied to robotic perception and manipulation, yet their ability to infer physical properties required for manipulation remains limited. In particular, estimating the mass of real-world objects is essential for determining appropriate grasp force and ensuring safe interaction. However, current VLMs lack reliable mass reasoning capabilities, […]
MSRAMIE: Multimodal Structured Reasoning Agent for Multi-instruction Image Editing
arXiv:2603.16967v1 Announce Type: cross Abstract: Existing instruction-based image editing models perform well with simple, single-step instructions but degrade in realistic scenarios that involve multiple, lengthy, and interdependent directives. A main cause is the scarcity of training data with complex multi-instruction annotations. However, it is costly to collect such data and retrain these models. To address […]
The State of Generative AI in Software Development: Insights from Literature and a Developer Survey
arXiv:2603.16975v1 Announce Type: cross Abstract: Generative Artificial Intelligence (GenAI) rapidly transforms software engineering, yet existing research remains fragmented across individual tasks in the Software Development Lifecycle. This study integrates a systematic literature review with a survey of 65 software developers. The results show that GenAI exerts its highest impact in design, implementation, testing, and documentation, […]