March 12, 2026 – Page 6 – dijee Pharma Intelligence

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

arXiv:2603.10101v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capacity of Large Language Models (LLMs). However, RLVR solely relies on final answers as outcome rewards, neglecting the correctness of intermediate reasoning steps. Training on these process-wrong but outcome-correct rollouts can lead to hallucination and answer-copying, severely undermining the […]

March 12, 2026

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

arXiv:2603.09117v1 Announce Type: cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers from calibration degeneration, where models become excessively over-confident in incorrect answers. Previous studies devote to directly incorporating calibration objective into existing optimization target. However, our theoretical analysis demonstrates that there exists a fundamental gradient […]

March 12, 2026

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

arXiv:2603.10126v1 Announce Type: cross Abstract: We propose a standalone autoregressive (AR) Action Expert that generates actions as a continuous causal sequence while conditioning on refreshable vision-language prefixes. In contrast to existing Vision-Language-Action (VLA) models and diffusion policies that reset temporal context with each new observation and predict actions reactively, our Action Expert maintains its own […]

March 12, 2026

Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation

arXiv:2603.11045v1 Announce Type: cross Abstract: We propose Neural Field Thermal Tomography (NeFTY), a differentiable physics framework for the quantitative 3D reconstruction of material properties from transient surface temperature measurements. While traditional thermography relies on pixel-wise 1D approximations that neglect lateral diffusion, and soft-constrained Physics-Informed Neural Networks (PINNs) often fail in transient diffusion scenarios due to […]

March 12, 2026

Social Knowledge for Cross-Domain User Preference Modeling

arXiv:2603.10148v1 Announce Type: cross Abstract: We demonstrate that user preferences can be represented and predicted across topical domains using large-scale social modeling. Given information about popular entities favored by a user, we project the user into a social embedding space learned from a large-scale sample of the Twitter (now X) network. By representing both users […]

March 12, 2026

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

arXiv:2603.09978v1 Announce Type: cross Abstract: Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates […]

March 12, 2026

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

arXiv:2603.10163v1 Announce Type: cross Abstract: The Model Context Protocol (MCP) is a recently proposed interoperability standard that unifies how AI agents connect with external tools and data sources. By defining a set of common client-server message exchange clauses, MCP replaces fragmented integrations with a standardized, plug-and-play framework. However, to be compatible with diverse AI agents, […]

March 12, 2026

Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions

arXiv:2506.07527v3 Announce Type: replace Abstract: Recent advances in large language model (LLM) reasoning have shown that sophisticated behaviors such as planning and self-reflection can emerge through reinforcement learning (RL). However, despite these successes, RL in its current form remains insufficient to induce capabilities that exceed the limitations of the base model, as it is primarily […]

March 12, 2026

Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

arXiv:2603.10195v1 Announce Type: cross Abstract: Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an explicit analogy to classical adaptive noise cancellation from signal processing. The framework identifies Hallucination Nodes […]

March 12, 2026

What We Don’t C: Manifold Disentanglement for Structured Discovery

arXiv:2511.09433v2 Announce Type: replace Abstract: Accessing information in learned representations is critical for annotation, discovery, and data filtering in disciplines where high-dimensional datasets are common. We introduce What We Don’t C, a novel approach based on latent flow matching that disentangles latent subspaces by explicitly removing information included in conditional guidance, resulting in meaningful residual […]

March 12, 2026

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

arXiv:2603.10885v1 Announce Type: cross Abstract: We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net’s best validation loss in 13 epochs (60$times$ fewer) and converges 39% lower, while […]

March 12, 2026

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

arXiv:2507.01957v2 Announce Type: replace-cross Abstract: We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To […]

March 12, 2026

Subscribe for Updates