March 9, 2026 – Page 9 – dijee Pharma Intelligence

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$rightarrow$LLM Pipelines?

arXiv:2602.17598v2 Announce Type: replace-cross Abstract: Speech LLMs are widely understood to be better than ASR$rightarrow$LLM cascades since they have access to the audio directly, and not just the transcript. In this paper, we present an evaluation methodology and a mechanistic interpretation of the observed behavior of speech LLMs. First, we introduce matched-backbone testing which separates […]

March 9, 2026

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

arXiv:2603.05764v1 Announce Type: cross Abstract: Autonomous coding agents can produce strong tabular baselines quickly on Kaggle-style tasks. Practical value depends on end-to-end correctness and reliability under time limits. This paper introduces TML-Bench, a tabular benchmark for data science agents on Kaggle-style tasks. This paper evaluates 10 OSS LLMs on four Kaggle competitions and three time […]

March 9, 2026

The World Won’t Stay Still: Programmable Evolution for Agent Benchmarks

arXiv:2603.05910v1 Announce Type: new Abstract: LLM-powered agents fulfill user requests by interacting with environments, querying data, and invoking tools in a multi-turn process. Yet, most existing benchmarks assume static environments with fixed schemas and toolsets, neglecting the evolutionary nature of the real world and agents’ robustness to environmental changes. In this paper, we study a […]

March 9, 2026

PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

arXiv:2603.05776v1 Announce Type: cross Abstract: Motivation: Patient-generated text contains critical information about patients’ lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available in structured form, limiting their use in patient-centered outcomes research and clinical quality improvement. Reliable […]

March 9, 2026

Measuring AI R&D Automation

arXiv:2603.03992v3 Announce Type: replace-cross Abstract: The automation of AI R&D (AIRDA) could have significant implications, but its extent and ultimate effects remain uncertain. We need empirical data to resolve these uncertainties, but existing data (primarily capability benchmarks) may not reflect real-world automation or capture its broader consequences, such as whether AIRDA accelerates capabilities more than […]

March 9, 2026

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

arXiv:2603.05542v1 Announce Type: cross Abstract: The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. In the AI era, data analysis increasingly involves large-scale, heterogeneous, and multimodal data that is predominantly unstructured, as well as foundation models such as LLMs and VLMs, which introduce additional […]

March 9, 2026

DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality

arXiv:2603.05912v1 Announce Type: new Abstract: Search-augmented LLM agents can produce deep research reports (DRRs), but verifying claim-level factuality remains challenging. Existing fact-checkers are primarily designed for general-domain, factoid-style atomic claims, and there is no benchmark to test whether such verifiers transfer to DRRs. Yet building such a benchmark is itself difficult. We first show that […]

March 9, 2026

Towards Efficient and Stable Ocean State Forecasting: A Continuous-Time Koopman Approach

arXiv:2603.05560v1 Announce Type: cross Abstract: We investigate the Continuous-Time Koopman Autoencoder (CT-KAE) as a lightweight surrogate model for long-horizon ocean state forecasting in a two-layer quasi-geostrophic (QG) system. By projecting nonlinear dynamics into a latent space governed by a linear ordinary differential equation, the model enforces structured and interpretable temporal evolution while enabling temporally resolution-invariant […]

March 9, 2026

LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop

arXiv:2603.06545v1 Announce Type: cross Abstract: We present LiveSense – a cross-platform that transforms a commercial off-the-shelf (COTS) Wi-Fi Network Interface Card (NIC) on a laptop into a centimeter-level Range-Doppler sensor while preserving simultaneous communication capability. The laptops are equipped with COTS Intel AX211 (Wi-Fi 6E) or Intel BE201 (Wi-Fi 7) NICs. LiveSense can (i) Extract […]

March 9, 2026

When AI Levels the Playing Field: Skill Homogenization, Asset Concentration, and Two Regimes of Inequality

arXiv:2603.05565v1 Announce Type: cross Abstract: Generative AI compresses within-task skill differences while shifting economic value toward concentrated complementary assets, creating an apparent paradox: the technology that equalizes individual performance may widen aggregate inequality. We formalize this tension in a task-based model with endogenous education, employer screening, and heterogeneous firms. The model yields two regimes whose […]

March 9, 2026

An Interactive Multi-Agent System for Evaluation of New Product Concepts

arXiv:2603.05980v1 Announce Type: new Abstract: Product concept evaluation is a critical stage that determines strategic resource allocation and project success in enterprises. However, traditional expert-led approaches face limitations such as subjective bias and high time and cost requirements. To support this process, this study proposes an automated approach utilizing a large language model (LLM)-based multi-agent […]

March 9, 2026

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

arXiv:2603.05574v1 Announce Type: cross Abstract: This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on a broad generic task, generated from a set of user-guided demonstrations, can be refined through […]

March 9, 2026

Subscribe for Updates