Uncategorized – Page 264 – dijee Pharma Intelligence

Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models

arXiv:2604.10985v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have rapidly advanced by leveraging powerful pre-trained Large Language Models (LLMs) as core reasoning backbones. As new and more capable LLMs emerge with improved reasoning, instruction-following, and generalization, there is a pressing need to efficiently update existing VLMs to incorporate these advancements. However, the integration of new […]

April 14, 2026

Sanity Checks for Agentic Data Science

arXiv:2604.11003v1 Announce Type: new Abstract: Agentic data science (ADS) pipelines have grown rapidly in both capability and adoption, with systems such as OpenAI Codex now able to directly analyze datasets and produce answers to statistical questions. However, these systems can reach falsely optimistic conclusions that are difficult for users to detect. To address this, we […]

April 14, 2026

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

arXiv:2604.09580v1 Announce Type: new Abstract: Standard Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs) with reasoning capabilities, yet its reliance on linear natural language is inherently insufficient for effective world modeling in embodied tasks. While text offers flexibility, it fails to explicitly represent the state-space, object hierarchies, and causal dependencies required for robust robotic planning. […]

April 14, 2026

Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling

arXiv:2604.11040v1 Announce Type: new Abstract: Office automation (OA) systems play a crucial role in enterprise operations and management, with access control flow approval (ACFA) being a key component that manages the accessibility of various resources. However, traditional ACFA requires approval from the person in charge at each step, which consumes a significant amount of manpower […]

April 14, 2026

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

arXiv:2604.11626v1 Announce Type: new Abstract: Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at […]

April 14, 2026

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

arXiv:2604.10708v1 Announce Type: cross Abstract: Recent progress in multimodal models has spurred rapid advances in audio understanding, generation, and editing. However, these capabilities are typically addressed by specialized models, leaving the development of a truly unified framework that can seamlessly integrate all three tasks underexplored. While some pioneering works have explored unifying audio understanding and […]

April 14, 2026

Beyond Message Passing: A Semantic View of Agent Communication Protocols

arXiv:2604.02369v3 Announce Type: replace-cross Abstract: Agent communication protocols are becoming critical infrastructure for large language model (LLM) systems that must use tools, coordinate with other agents, and operate across heterogeneous environments. This work presents a human-inspired perspective on this emerging landscape by organizing agent communication into three layers: communication, syntactic, and semantic. Under this framework, […]

April 14, 2026

SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

arXiv:2604.10688v1 Announce Type: cross Abstract: On-policy reinforcement learning has become the dominant paradigm for reasoning alignment in large language models, yet its sparse, outcome-level rewards make token-level credit assignment notoriously difficult. On-Policy Distillation (OPD) alleviates this by introducing dense, token-level KL supervision from a teacher model, but typically applies this supervision uniformly across all rollouts, […]

April 14, 2026

Learning to Focus and Precise Cropping: A Reinforcement Learning Framework with Information Gaps and Grounding Loss for MLLMs

arXiv:2603.27494v2 Announce Type: replace-cross Abstract: To enhance the perception and reasoning capabilities of multimodal large language models in complex visual scenes, recent research has introduced agent-based workflows. In these works, MLLMs autonomously utilize image cropping tool to analyze regions of interest for question answering. While existing training strategies, such as those employing supervised fine-tuning and […]

April 14, 2026

One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions

arXiv:2604.11403v1 Announce Type: cross Abstract: Analyzing unsteady fluid flows often requires access to the full distribution of possible temporal states, yet conventional PDE solvers are computationally prohibitive and learned time-stepping surrogates quickly accumulate error over long rollouts. Generative models avoid compounding error by sampling states independently, but diffusion and flow-matching methods, while accurate, are limited […]

April 14, 2026

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

arXiv:2601.06165v2 Announce Type: replace-cross Abstract: Current vision-language benchmarks predominantly feature well-structured questions with clear, explicit prompts. However, real user queries are often informal and underspecified. Users naturally leave much unsaid, relying on images to convey context. We introduce HAERAE-Vision, a benchmark of 653 real-world visual questions from Korean online communities (0.76% survival from 86K candidates), […]

April 14, 2026

Do Neurons Dream of Primitive Operators? Wake-Sleep Compression Rediscovers Schank’s Event Semantics

arXiv:2603.25975v2 Announce Type: replace-cross Abstract: We show that they do. Roger Schank’s conceptual dependency theory proposed that all human events decompose into primitive operations — ATRANS (transfer of possession), PTRANS (physical movement), MTRANS (information transfer), and others — hand-coded from linguistic intuition. We ask: can the same primitives be discovered automatically through compression pressure alone? […]

April 14, 2026

Subscribe for Updates