April 22, 2026 – Page 5 – dijee Pharma Intelligence

STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs

arXiv:2604.18177v2 Announce Type: replace-cross Abstract: Benchmarks are often used as a standard to understand LLM capabilities in different domains. However, aggregate benchmark scores provide limited insight into compositional skill gaps of LLMs and how to improve them. To make these weaknesses visible, we propose Scaffolded Task Design (STaD) framework. STaD generates controlled variations of benchmark […]

April 22, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

arXiv:2604.19533v1 Announce Type: cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The benchmark wraps […]

April 22, 2026

Care Trajectories Are Linked to Mental Health and Mortality in Cancer Patients

arXiv:2604.18431v2 Announce Type: replace-cross Abstract: Treatment of cancer involves heterogeneous, complex care pathways. The relationship between these longitudinal trajectories, baseline mental health, and prognostic outcomes remains poorly understood. We introduce an interpretable time-analysis framework leveraging these temporal dynamics, analyzing care patterns spanning up to 37 years for >8,000 patients. Using Dynamic Time Warping (DTW) and […]

April 22, 2026

Right for the Wrong Reasons: Epistemic Regret Minimization for LLM Causal Reasoning

arXiv:2602.11675v3 Announce Type: replace Abstract: Large language models may answer causal questions correctly for the wrong reasons, substituting associational shortcuts P(Y|X) for the interventional query P(Y|do(X)). Current RL methods reward what the model answers but not why, reinforcing these shortcuts until distribution shift exposes them. We introduce Epistemic Regret Minimization (ERM), a framework that identifies […]

April 22, 2026

LEPO: Latent Reasoning Policy Optimization for Large Language Models

arXiv:2604.17892v2 Announce Type: replace-cross Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference, failing to discover diverse reasoning paths. To bridge the gap, we inject controllable stochasticity into latent reasoning via Gumbel-Softmax, restoring […]

April 22, 2026

Album: executable building blocks for scientific imaging routines, from sharing to LLM-assisted orchestration

arXiv:2110.00601v2 Announce Type: replace-cross Abstract: Open-source scientific software is a major driver of scientific progress, yet its development and reuse remain difficult in collaborative settings. Researchers repeatedly face four recurring challenges: discovering and reproducing existing routines, adapting them for new use cases, sharing and scaling them across collaborators, and stabilizing them with reproducible execution environments. […]

April 22, 2026

When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift

arXiv:2604.19514v1 Announce Type: cross Abstract: The consensus that GCN, GraphSAGE, GAT, and EvolveGCN outperform feature-only baselines on the Elliptic Bitcoin Dataset is widely cited but has not been rigorously stress-tested under a leakage-free evaluation protocol. We perform a seed-matched inductive-versus-transductive comparison and find that this consensus does not hold. Under a strictly inductive protocol, Random […]

April 22, 2026

Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

arXiv:2509.18831v2 Announce Type: replace-cross Abstract: Recent advances in diffusion models have significantly improved image and video synthesis. In addition, several concept control methods have been proposed to enable fine-grained, continuous, and flexible control over free-form text prompts. However, these methods not only require intensive training time and GPU memory usage to learn the sliders or […]

April 22, 2026

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

arXiv:2604.17789v2 Announce Type: replace-cross Abstract: The MXFP4 microscaling format, which partitions tensors into blocks of 32 elements sharing an E8M0 scaling factor, has emerged as a promising substrate for efficient LLM inference, backed by native hardware support on NVIDIA Blackwell Tensor Cores. However, activation outliers pose a unique challenge under this format: a single outlier […]

April 22, 2026

Investigating the structure of emotions by analyzing similarity and association of emotion words

arXiv:2602.06430v2 Announce Type: replace-cross Abstract: In the field of natural language processing, some studies have attempted sentiment analysis on text by handling emotions as explanatory or response variables. One of the most popular emotion models used in this context is the wheel of emotion proposed by Plutchik. This model schematizes human emotions in a circular […]

April 22, 2026

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

arXiv:2604.19485v1 Announce Type: cross Abstract: Reinforcement learning (RL) for LLM post-training faces a fundamental design choice: whether to use a learned critic as a baseline for policy optimization. Classical theory favors critic-based methods such as PPO for variance reduction, yet critic-free alternatives like GRPO have gained widespread adoption due to their simplicity and competitive performance. […]

April 22, 2026

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

arXiv:2603.01455v3 Announce Type: replace-cross Abstract: While multimodal large language models have demonstrated impressive short-term reasoning, they struggle with long-horizon video understanding due to limited context windows and static memory mechanisms that fail to mirror human cognitive efficiency. Existing paradigms typically fall into two extremes: vision-centric methods that incur high latency and redundancy through dense visual […]

April 22, 2026

Subscribe for Updates