May 26, 2026 – Page 3 – dijee Pharma Intelligence

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

arXiv:2605.14605v2 Announce Type: replace-cross Abstract: Model providers increasingly release open weights or allow users to fine-tune foundation models through APIs. Although these models are safety-aligned before release, their safeguards can often be removed by fine-tuning on harmful data. Recent defenses aim to make models robust to such malicious fine-tuning, but they are largely evaluated only […]

May 26, 2026

AI-Driven Alpha Decay: Algorithmic Homogenization, Reflexive Signal Erosion, and the Paradox of Intelligent Markets

arXiv:2605.23905v1 Announce Type: cross Abstract: We show that AI-driven investment strategies are inherently self-defeating at scale. As AI adoption rises, three mutually reinforcing channels — signal crowding, performative signal erosion, and Red Queen competition — compress excess returns. We derive the alpha half-life $h(phi) = ln 2/[theta + delta(phi)]$, where $theta$ is the natural mean-reversion […]

May 26, 2026

SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models

arXiv:2605.25420v1 Announce Type: cross Abstract: Large language model safety evaluation remains heavily English-centered, leaving low-resource languages under-measured even when models are deployed globally. We evaluate four open-weight instruction-tuned models on SomaliBench v0, a native-author-verified benchmark of 100 harmful-intent prompts paired across English and Somali. Each of Llama-3.1-8B-Instruct, Gemma-2-9B-Instruct, Qwen-2.5-7B-Instruct, and Aya-23-8B is run locally with […]

May 26, 2026

Don’t Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models

arXiv:2605.25681v1 Announce Type: cross Abstract: Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder than standard single-target generation because one candidate must satisfy two binding requirements while preserving drug-likeness and synthesizability. Existing dual-target generative methods typically introduce dual-target capability by either retraining the generator or […]

May 26, 2026

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

arXiv:2605.01284v2 Announce Type: replace-cross Abstract: Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful paradigm for answering complex multi-hop questions by progressively retrieving and reasoning over external documents. However, current systems predominantly operate on parsed text, which creates two critical bottlenecks: (1) textitCoarse-grained attribution, where users are burdened with manually locating evidence within lengthy documents […]

May 26, 2026

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

arXiv:2605.17937v2 Announce Type: replace-cross Abstract: Quantitative backtesting is essential for evaluating trading strategies but remains hampered by high technical barriers and limited scalability. While Large Language Models (LLMs) offer a transformative path to automate this complex, interdisciplinary workflow through advanced code generation, tool usage, and agentic planning, the practical realization is significantly challenged by the […]

May 26, 2026

MultiPhishGuard: An Explainable and Adaptive Multi-Agent LLM System for Phishing Email Detection

arXiv:2505.23803v2 Announce Type: replace-cross Abstract: Phishing email detection faces significant challenges due to evolving adversarial tactics and heterogeneous attack patterns. Traditional approaches, such as rule-based filters and denylists, often struggle to keep pace, leading to missed detections and security risks. While machine learning methods have improved detection performance, they remain limited in adapting to novel […]

May 26, 2026

Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization

arXiv:2601.10201v2 Announce Type: replace-cross Abstract: Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-training, but its KL regularization is usually implemented as a local loss-side token penalty. We show that this misses the policy-gradient signal induced by autoregressive KL regularization. Unlike standard KL-regularized Reinforcement Learning (RL) objectives, GRPO’s group […]

May 26, 2026

Exact Variance and Fano Factor for Arbitrary Level Crossings in Stationary Gaussian Processes

arXiv:2605.25278v1 Announce Type: cross Abstract: Understanding the statistics of level crossings in stochastic processes is crucial across many scientific disciplines. The traditional Kac-Rice formula gives the mean rate of level crossings and has found broad use. However, that mean rate captures only a coarse summary of the crossing process. It depends entirely on local properties […]

May 26, 2026

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

arXiv:2602.17162v2 Announce Type: replace Abstract: Genomic Foundation Models (GFMs) typically rely on Masked Language Modeling (MLM) or Next-Token Prediction (NTP) to learn the “Laws of Nature”. While effective at capturing local syntax, these generative paradigms prioritize token-level reconstruction over high-level functional context. We introduce JEPA-DNA, a model-agnostic continual training framework that integrates a Joint-Embedding Predictive […]

May 26, 2026

Understanding Data Temporality Impact on Large Language Models Pre-training

arXiv:2605.22769v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically on data ordering. Our main […]

May 26, 2026

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

arXiv:2605.22800v2 Announce Type: replace-cross Abstract: Robustness, domain adaptation, photometric/occlusion invariance, sensor drift, and alignment style are treated as separate literatures with separate method families. Under label-preserving deployment shift they share one geometric object: the covariance Sigma_task = Cov_Q_n(n) of ways inputs can change without changing the label. CORAL, adversarial training, augmentation, metric learning, Jacobian penalties, […]

May 26, 2026

Subscribe for Updates