arXiv:2601.16984v1 Announce Type: cross Abstract: The 3rd Generation Partnership Project (3GPP) produces complex technical specifications essential to global telecommunications, yet their hierarchical structure, dense formatting, and multi-modal content make them difficult to process. While Large Language Models (LLMs) show promise, existing approaches fall short in handling complex queries, visual information, and document interdependencies. We present […]
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
arXiv:2601.18282v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in function calling for autonomous agents, yet current mechanisms lack explicit reasoning transparency during parameter generation, particularly for complex functions with interdependent parameters. While existing approaches like chain-of-thought prompting operate at the agent level, they fail to provide fine-grained reasoning guidance for […]
DEEPMED: Building a Medical DeepResearch Agent via Multi-hop Med-Search Data and Turn-Controlled Agentic Training & Inference
arXiv:2601.18496v1 Announce Type: new Abstract: Medical reasoning models remain constrained by parametric knowledge and are thus susceptible to forgetting and hallucinations. DeepResearch (DR) models ground outputs in verifiable evidence from tools and perform strongly in general domains, but their direct transfer to medical field yields relatively limited gains. We attribute this to two gaps: task […]
A Mechanistic View on Video Generation as World Models: State and Dynamics
arXiv:2601.17067v1 Announce Type: cross Abstract: Large-scale video generation models have demonstrated emergent physical coherence, positioning them as potential world models. However, a gap remains between contemporary “stateless” video architectures and classic state-centric world model theories. This work bridges this gap by proposing a novel taxonomy centered on two pillars: State Construction and Dynamics Modeling. We […]
A model for a population of trees structured by phenological traits
arXiv:2601.18214v1 Announce Type: new Abstract: In the context of global warming, tree populations rely on two primary mechanisms of adaptation: phenotypic plasticity, which enables individuals to adjust their behavior in response to environmental stress, and genetic evolution, driven by natural selection and genetic diversity within the population. Understanding the interplay between these mechanisms is crucial […]
AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito
arXiv:2601.18381v1 Announce Type: new Abstract: To facilitate the transformation of legacy finite difference implementations into the Devito environment, this study develops an integrated AI agent framework. Retrieval-Augmented Generation (RAG) and open-source Large Language Models are combined through multi-stage iterative workflows in the system’s hybrid LangGraph architecture. The agent constructs an extensive Devito knowledge graph through […]
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
arXiv:2601.18608v1 Announce Type: new Abstract: Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee’s KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating […]
Point transformer for protein structural heterogeneity analysis using CryoEM
arXiv:2601.18713v1 Announce Type: new Abstract: Structural dynamics of macromolecules is critical to their structural-function relationship. Cryogenic electron microscopy (CryoEM) provides snapshots of vitrified protein at different compositional and conformational states, and the structural heterogeneity of proteins can be characterized through computational analysis of the images. For protein systems with multiple degrees of freedom, it is […]
AI-based System for Transforming text and sound to Educational Videos
arXiv:2601.17022v1 Announce Type: cross Abstract: Technological developments have produced methods that can generate educational videos from input text or sound. Recently, the use of deep learning techniques for image and video generation has been widely explored, particularly in education. However, generating video content from conditional inputs such as text or speech remains a challenging area. […]
Arabic Sign Language Recognition using Multimodal Approach
arXiv:2601.17041v1 Announce Type: cross Abstract: Arabic Sign Language (ArSL) is an essential communication method for individuals in the Deaf and Hard-of-Hearing community. However, existing recognition systems face significant challenges due to their reliance on single sensor approaches like Leap Motion or RGB cameras. These systems struggle with limitations such as inadequate tracking of complex hand […]
A Computer Vision Pipeline for Iterative Bullet Hole Tracking in Rifle Zeroing
arXiv:2601.17062v1 Announce Type: cross Abstract: Adjusting rifle sights, a process commonly called “zeroing,” requires shooters to identify and differentiate bullet holes from multiple firing iterations. Traditionally, this process demands physical inspection, introducing delays due to range safety protocols and increasing the risk of human error. We present an end-to-end computer vision system for automated bullet […]
ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants
arXiv:2601.18225v1 Announce Type: new Abstract: Large language model (LLM)-based agents are increasingly deployed in e-commerce shopping. To perform thorough, user-tailored product searches, agents should interpret personal preferences, engage in multi-turn dialogues, and ultimately retrieve and discriminate among highly similar products. However, existing research has yet to provide a unified simulation environment that consistently captures all […]