April 27, 2026 – Page 14 – dijee Pharma Intelligence

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity

arXiv:2604.22597v1 Announce Type: new Abstract: Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models’ intelligence in logical reasoning and problem-solving. Models are evaluated on mathematical reasoning benchmarks by verifying the correctness of the final answer against a ground truth answer. A […]

April 27, 2026

Cross-Stage Coherence in Hierarchical Driving VQA: Explicit Baselines and Learned Gated Context Projectors

arXiv:2604.22560v1 Announce Type: cross Abstract: Graph Visual Question Answering (GVQA) for autonomous driving organizes reasoning into ordered stages, namely Perception, Prediction, and Planning, where planning decisions should remain consistent with the model’s own perception. We present a comparative study of cross-stage context passing on DriveLM-nuScenes using two complementary mechanisms. The explicit variant evaluates three prompt-based […]

April 27, 2026

Simple sign epistasis and evolutionary detours in fitness landscapes

arXiv:2604.22611v1 Announce Type: new Abstract: In epistatic fitness landscapes, the fitness effect of a mutation depends on the genetic background and may even switch between deleterious and beneficial depending on the presence of another mutation. Epistatic interactions may cause both mutations to change the sign of each other’s fitness effects (reciprocal sign epistasis) or only […]

April 27, 2026

Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

arXiv:2604.22662v1 Announce Type: cross Abstract: Shapley values are a cornerstone of explainable AI, yet their proliferation into competing formulations has created a fragmented landscape with little consensus on practical deployment. While theoretical differences are well-documented, evaluation remains reliant on quantitative proxies whose alignment with human utility is unverified. In this work, we use a unified […]

April 27, 2026

CRAFT: Clustered Regression for Adaptive Filtering of Training data

arXiv:2604.22693v1 Announce Type: cross Abstract: Selecting a small, high-quality subset from a large corpus for fine-tuning is increasingly important as corpora grow to tens of millions of datapoints, making full fine-tuning expensive and often unnecessary. We propose CRAFT (Clustered Regression for Adaptive Filtering of Training data), a vectorization-agnostic selection method for training sequence-to-sequence models. CRAFT […]

April 27, 2026

What are the functions of primary visual cortex (V1)?

arXiv:2604.22716v1 Announce Type: new Abstract: Although Hubel and Wiesel established decades ago how individual V1 neurons transform retinal inputs, functions of V1 as a whole are being discovered only recently. First, V1 acts as a motor cortex for exogenously guiding saccades by constructing a bottom-up saliency map of the visual field. Second, V1 initiates a […]

April 27, 2026

An Undecidability Proof for the Plan Existence Problem

arXiv:2604.22736v1 Announce Type: cross Abstract: The plan existence problem asks, given a goal in the form of a formula in modal logic, an initial epistemic state (a pointed Kripke model), and a set of epistemic actions, whether there exists a sequence of actions that can be applied to reach the goal. We prove that even […]

April 27, 2026

EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

arXiv:2604.20133v2 Announce Type: replace Abstract: This paper proposes EvoAgent – an evolvable large language model (LLM) agent framework that integrates structured skill learning with a hierarchical sub-agent delegation mechanism. EvoAgent models skills as multi-file structured capability units equipped with triggering mechanisms and evolutionary metadata, and enables continuous skill generation and optimization through a user-feedback-driven closed-loop […]

April 27, 2026

Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

arXiv:2604.22072v1 Announce Type: cross Abstract: Federated learning (FL) aggregation on serverless platforms faces a hard scalability ceiling: existing architectures (lambda-FL, LIFL) partition clients across aggregators, but every aggregator must hold the complete model gradient in memory. When gradients exceed the per-function memory limit (e.g., 10 GB on AWS Lambda), aggregation becomes infeasible regardless of tree […]

April 27, 2026

Mechanistic Interpretability of Antibody Language Models Using SAEs

arXiv:2512.05794v2 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ TopK and Ordered SAEs to investigate autoregressive antibody language models, and steer their generation. We show that TopK SAEs can reveal biologically meaningful latent […]

April 27, 2026

BLAST: Benchmarking LLMs with ASP-based Structured Testing

arXiv:2604.22306v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a broad spectrum of tasks, including natural language understanding, dialogue systems, and code generation. Despite evident progress, less attention has been paid to their effectiveness in handling declarative paradigms such as Answer Set Programming (ASP), to date. In this paper we […]

April 27, 2026

AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning

arXiv:2511.14135v2 Announce Type: replace-cross Abstract: Fair workload enforcement in heterogeneous multi-agent systems that pursue shared objectives remains challenging. Fixed fairness penalties often introduce inefficiencies, training instability, and conflicting agent incentives. Reward-shaping approaches in fair Multi-Agent Reinforcement Learning (MARL) typically incorporate fairness through heuristic penalties or scalar reward modifications and often rely on post-hoc evaluation. However, […]

April 27, 2026

Subscribe for Updates