April 6, 2026 – Page 22 – dijee Pharma Intelligence

Co-Evolution of Policy and Internal Reward for Language Agents

arXiv:2604.03098v1 Announce Type: cross Abstract: Large language model (LLM) agents learn by interacting with environments, but long-horizon training remains fundamentally bottlenecked by sparse and delayed rewards. Existing methods typically address this challenge through post-hoc credit assignment or external reward models, which provide limited guidance at inference time and often separate reward improvement from policy improvement. […]

April 6, 2026

Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

arXiv:2507.04356v2 Announce Type: replace-cross Abstract: Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version […]

April 6, 2026

CeRA: Overcoming the Linear Ceiling of Low-Rank Adaptation via Capacity Expansion

arXiv:2602.22911v5 Announce Type: replace-cross Abstract: Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a “linear ceiling”: increasing the rank yields diminishing returns in expressive capacity due to intrinsic linear constraints. We introduce CeRA (Capacity-enhanced Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and dropout to induce non-linear capacity expansion. We demonstrate […]

April 6, 2026

AgenticRed: Evolving Agentic Systems for Red-Teaming

arXiv:2601.13518v3 Announce Type: replace Abstract: While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on human-specified workflows. This dependence on manually designed workflows suffers from human biases and makes exploring the broader design space expensive. We introduce AgenticRed, an automated pipeline that leverages LLMs’ in-context learning to iteratively […]

April 6, 2026

S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack

arXiv:2410.13891v3 Announce Type: replace-cross Abstract: Transferable Targeted Attacks (TTAs) face significant challenges due to severe overfitting to surrogate models. Recent breakthroughs heavily rely on large-scale training data of victim models, while data-free solutions, textiti.e., image transformation-involved gradient optimization, often depend on black-box feedback for method design and tuning. These dependencies violate black-box transfer settings and […]

April 6, 2026

Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring

arXiv:2509.25438v2 Announce Type: replace-cross Abstract: When there exists an unlearnable source of randomness (noisy-TV) in the environment, a naively intrinsic reward driven exploring agent gets stuck at that source of randomness and fails at exploration. Intrinsic reward based on uncertainty estimation or distribution similarity, while eventually escapes noisy-TVs as time unfolds, suffers from poor sample […]

April 6, 2026

No Universal Hyperbola: A Formal Disproof of the Epistemic Trade-Off Between Certainty and Scope in Symbolic and Generative AI

arXiv:2601.08845v2 Announce Type: replace-cross Abstract: In direct response to requests for a logico-mathematical test of the conjecture, we formally disprove a recently conjectured artificial intelligence trade-off between epistemic certainty and scope in its published universal hyperbolic product form, as introduced in Philosophy and Technology. Certainty is defined as the worst-case correctness probability over the input […]

April 6, 2026

CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

arXiv:2603.18545v2 Announce Type: replace-cross Abstract: Medical vision–language models (MVLMs) are increasingly used as perceptual backbones in radiology pipelines and as the visual front end of multimodal assistants, yet their reliability under real clinical workflows remains underexplored. Prior robustness evaluations often assume clean, curated inputs or study isolated corruptions, overlooking routine acquisition, reconstruction, display, and delivery […]

April 6, 2026

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

arXiv:2512.16300v2 Announce Type: replace Abstract: Existing image forgery detection (IFD) methods either exploit low-level, semantics-agnostic artifacts or rely on multimodal large language models (MLLMs) with high-level semantic knowledge. Although naturally complementary, these two information streams are highly heterogeneous in both paradigm and reasoning, making it difficult for existing methods to unify them or effectively model […]

April 6, 2026

PAPO: Stabilizing Rubric Integration Training via Decoupled Advantage Normalization

arXiv:2603.26535v3 Announce Type: replace Abstract: We propose Process-Aware Policy Optimization (PAPO), a method that integrates process-level evaluation into Group Relative Policy Optimization (GRPO) through decoupled advantage normalization, to address two limitations of existing reward designs. Outcome reward models (ORM) evaluate only final-answer correctness, treating all correct responses identically regardless of reasoning quality, and gradually lose […]

April 6, 2026

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

arXiv:2409.18512v2 Announce Type: replace-cross Abstract: Recent advancements in speech synthesis have enabled large language model (LLM)-based systems to perform zero-shot generation with controllable content, timbre, speaker identity, and emotion through input prompts. As a result, these models heavily rely on prompt design to guide the generation process. However, existing prompt selection methods often fail to […]

April 6, 2026

Unified Thinker: A General Reasoning Modular Core for Image Generation

arXiv:2601.03127v2 Announce Type: replace-cross Abstract: Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning–execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better […]

April 6, 2026

Subscribe for Updates