May 15, 2026 – Page 8 – dijee Pharma Intelligence

Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints

arXiv:2605.14362v1 Announce Type: cross Abstract: Context window efficiency is a practical constraint in large language model (LLM)-based developer tools. Paulsen [12] shows that all tested models degrade in accuracy well before their advertised context limits the Maximum Effective Context Window (MECW) which makes context construction a quality problem, not just a cost one. Modern software […]

May 15, 2026

MathAtlas: A Benchmark for Autoformalization in the Wild

arXiv:2605.14061v1 Announce Type: new Abstract: Current autoformalization benchmarks are largely focused on olympiad or undergraduate mathematics, while graduate and research-level mathematics remains underexplored. In this paper, we introduce MathAtlas, the first large-scale autoformalization benchmark of in the wild graduate-level mathematics, containing ~52k theorems, definitions, exercises, examples, and proofs extracted from 103 graduate mathematics textbooks. MathAtlas […]

May 15, 2026

Gradient Iterated Temporal-Difference Learning

arXiv:2603.07833v2 Announce Type: replace-cross Abstract: Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent’s long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the gradient of the bootstrapped estimate. While popular, this type of update is prone to divergence, as Baird’s […]

May 15, 2026

Know When To Fold ‘Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

arXiv:2605.14062v1 Announce Type: new Abstract: While synthetic data generation with large language models (LLMs) is widely used in post-training pipelines, existing approaches typically generate full outputs before applying quality filters, leading to substantial token waste on samples that are ultimately discarded. To address this, we propose Multi-Stage In-Flight Rejection (MSIFR), a lightweight, training-free framework that […]

May 15, 2026

The Great Pretender: A Stochasticity Problem in LLM Jailbreak

arXiv:2605.14418v1 Announce Type: cross Abstract: “Oh-Oh, yes, I’m the great pretender. Pretending that I’m doing well. My need is such, I pretend too much…” summarizes the state in the area of jailbreak creation and evaluation. You find this method to generate adversarial attacks proposed by a reputable institution (e.g., BoN from Anthropic or Crescendo from […]

May 15, 2026

Evaluating Adaptive Personalization of Educational Readings with Simulated Learners

arXiv:2604.16744v2 Announce Type: replace-cross Abstract: We present a framework for evaluating adaptive personalization of educational reading materials with theory-grounded simulated learners. The system builds a learning-objective and knowledge-component ontology from open textbooks, curates it in a browser-based Ontology Atlas, labels textbook chunks with ontology entities, and generates aligned reading-assessment pairs. Simulated readers learn from passages […]

May 15, 2026

SparseOIT: Improving Order-Independent Transparency 3DGS via Active Set Method

arXiv:2605.13855v1 Announce Type: cross Abstract: 3D Gaussian Splatting (3DGS) has received tremendous popularity over the past few years due to its photorealistic visual appearance. However, 3DGS uses volumetric rendering that is not suitable for objects with non-lambertian or transparent materials. To remedy this issue, a family of Order-Independent Transparency (OIT) rendering methods propose to remove […]

May 15, 2026

Exploitation of Hidden Context in Dynamic Movement Forecasting: A Neural Network Journey from Recurrent to Graph Neural Networks and General Purpose Transformers

arXiv:2605.14855v1 Announce Type: cross Abstract: Forecasting within signal processing pipelines is crucial for mitigating delays, particularly in predicting the dynamic movements of objects such as NBA players. This task poses significant challenges due to the inherently interactive and unpredictable nature of sports, where abrupt changes in velocity and direction are prevalent. Traditional approaches, including (S)ARIMA(X), […]

May 15, 2026

CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

arXiv:2602.17949v2 Announce Type: replace-cross Abstract: Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and associated concepts. Constructing these sets is labour-intensive, […]

May 15, 2026

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

arXiv:2605.07931v3 Announce Type: replace-cross Abstract: Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically pass the per-frame visual stream into the world module at high visual bandwidth and […]

May 15, 2026

BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring

arXiv:2605.14886v1 Announce Type: new Abstract: Electrocardiogram (ECG) monitoring in Internet of Medical Things (IoMT) networks is constrained by strict data-sharing regulations and privacy concerns. Federated learning (FL) enables collaborative learning by keeping raw ECG data on devices, but frequent transmissions of high-dimensional model updates incur heavy per-round traffic over bandwidth-limited links. To alleviate this bottleneck, […]

May 15, 2026

Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation

arXiv:2605.14790v1 Announce Type: cross Abstract: Research idea generation is the innovation-driving step of automated scientific research. Recently, large language models (LLMs) have shown potential for automating idea generation at scale. However, existing methods mainly condition LLMs on eliciting idea generation through static retrieval of relevant literature or complex prompt engineering, without discarding the structural relations […]

May 15, 2026

Subscribe for Updates