January 29, 2026 – Page 25 – DIJEE Pharma Intelligence

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

arXiv:2601.19956v1 Announce Type: cross Abstract: As Speech Language Models (SLMs) transition from personal devices to shared, multi-user environments such as smart homes, a new challenge emerges: the model is expected to distinguish between users to manage information flow appropriately. Without this capability, an SLM could reveal one user’s confidential schedule to another, a privacy failure […]

January 29, 2026

Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

arXiv:2509.22258v4 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medical benchmarks, yet their true clinical reasoning ability remains unclear. Existing datasets predominantly emphasize classification accuracy, creating an evaluation illusion in which models appear proficient while still failing at high-stakes diagnostic reasoning. We introduce Neural-MedBench, a compact yet […]

January 29, 2026

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

arXiv:2601.19961v1 Announce Type: cross Abstract: We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached […]

January 29, 2026

Can Continuous-Time Diffusion Models Generate and Solve Globally Constrained Discrete Problems? A Study on Sudoku

arXiv:2601.20363v1 Announce Type: cross Abstract: Can standard continuous-time generative models represent distributions whose support is an extremely sparse, globally constrained discrete set? We study this question using completed Sudoku grids as a controlled testbed, treating them as a subset of a continuous relaxation space. We train flow-matching and score-based models along a Gaussian probability path […]

January 29, 2026

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents

arXiv:2601.20404v1 Announce Type: cross Abstract: AI coding agents such as Codex and Claude Code are increasingly used to autonomously contribute to software repositories. However, little is known about how repository-level configuration artifacts affect operational efficiency of the agents. In this paper, we study the impact of AGENTS.md files on the runtime and token consumption of […]

January 29, 2026

Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation

arXiv:2601.19941v1 Announce Type: cross Abstract: In last two years, large language models (LLMs) have shown strong capabilities in code generation, including hardware design at register-transfer level (RTL). While their use in high-level synthesis (HLS) remains comparatively less mature, the ratio of HLS- to RTL-focused studies has shifted from 1:10 to 2:10 in the past six […]

January 29, 2026

Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding

arXiv:2601.19929v1 Announce Type: cross Abstract: We introduce Stingy Context, a hierarchical tree-based compression scheme achieving 18:1 reduction in LLM context for auto-coding tasks. Using our TREEFRAG exploit decomposition, we reduce a real source code base of 239k tokens to 11k tokens while preserving task fidelity. Empirical results across 12 Frontier models show 94 to 97% […]

January 29, 2026

Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents

arXiv:2601.19935v1 Announce Type: cross Abstract: Large Language Model (LLM)-based agents are increasingly deployed for complex, tool-based tasks where long-term memory is critical to driving actions. Existing benchmarks, however, primarily test a angent’s ability to passively retrieve isolated facts in response to explicit questions. They fail to evaluate the more crucial capability of actively applying memory […]

January 29, 2026

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication

arXiv:2601.19915v1 Announce Type: cross Abstract: We introduce the emphArrow Language Model, a neural architecture derived from an intuitionistic-logic interpretation of next-token prediction. Instead of representing tokens as additive embeddings mixed by attention, we encode a prefix as a emphleft-nested implication chain whose structure preserves order through non-commutative composition. Next-token prediction corresponds to emphmodus ponens, and […]

January 29, 2026

Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation

arXiv:2601.19923v1 Announce Type: cross Abstract: As Large Language Models (LLMs) evolve into autonomous agents, the capability to faithfully translate natural language into rigorous structured formats-essential for tool invocation-and to convert complex tabular information into machine-readable specifications has become paramount. However, current evaluations lack effective methodologies to measure this structural fidelity without costly human intervention, as […]

January 29, 2026

Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)

arXiv:2601.20843v1 Announce Type: new Abstract: This paper introduces a novel Deep Researcher architecture designed to generate detailed research reports on complex PhD level topics by addressing the inherent limitations of the Parallel Scaling paradigm. Our system utilizes two key innovations: Sequential Research Plan Refinement via Reflection and a Candidates Crossover algorithm. The sequential refinement process […]

January 29, 2026

GTAC: A Generative Transformer for Approximate Circuits

arXiv:2601.19906v1 Announce Type: cross Abstract: Targeting error-tolerant applications, approximate circuits introduce controlled errors to significantly improve performance, power, and area (PPA) of circuits. In this work, we introduce GTAC, a novel generative Transformer-based model for producing approximate circuits. By leveraging principles of approximate computing and AI-driven EDA, our model innovatively integrates error thresholds into the […]

January 29, 2026

Subscribe for Updates