May 7, 2026 – Page 16 – dijee Pharma Intelligence

Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning

arXiv:2605.05123v1 Announce Type: cross Abstract: In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with […]

May 7, 2026

Provable Distributional Value Iteration under Partial Observability

arXiv:2505.06518v3 Announce Type: replace Abstract: In many real-world planning tasks, agents must tackle uncertainty about the environment’s state and variability in the outcomes induced by stochastic dynamics and rewards. Motivated by recent progress in world model approaches, where latent models approximate beliefs and support planning, we extend Distributional Reinforcement Learning (DistRL), which models the entire […]

May 7, 2026

Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

arXiv:2603.25412v2 Announce Type: replace Abstract: Large language models increasingly rely on explicit chain-of-thought reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work focuses predominantly on content safety (i.e., detecting harmful, biased, or factually incorrect outputs), while treating the underlying reasoning chain as an opaque intermediate artifact. […]

May 7, 2026

Shadow-Loom: Causal Reasoning over Graphical World Models of Narratives

arXiv:2605.02475v2 Announce Type: replace Abstract: Stories hold a reader’s attention because they have causes, secrets, and consequences. Shadow-Loom is an experimental open-source framework that turns a narrative into a versioned graphical world model and lets two engines act on it: a causal physics grounded in Pearl’s ladder of causation and a recently proposed counterfactual calculus […]

May 7, 2026

Beyond Public Access in LLM Pre-Training Data

arXiv:2505.00020v2 Announce Type: replace-cross Abstract: Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models show recognition of copyrighted content. Our results based on this small sample suggest that GPT-4o, OpenAI’s more recent and capable model, exhibits patterns consistent […]

May 7, 2026

VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping

arXiv:2511.13587v3 Announce Type: replace-cross Abstract: Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its “draft one step, then verify one step” paradigm prevents a direct reduction in the number of […]

May 7, 2026

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing

arXiv:2602.03452v2 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) is effective for training large language models on deterministic outcome reasoning tasks. Prior work shows RLVR works with few prompts, but prompt selection is often based only on training-accuracy variance, leading to unstable optimization directions and weaker transfer. We revisit prompt selection from a […]

May 7, 2026

Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing

arXiv:2601.00020v3 Announce Type: replace-cross Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by […]

May 7, 2026

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

arXiv:2602.12783v2 Announce Type: replace-cross Abstract: Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a […]

May 7, 2026

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

arXiv:2604.11840v2 Announce Type: replace-cross Abstract: Behavioral simulation and strategic problem solving are different tasks. Large language models are increasingly explored as agents in policy-facing institutional simulations, but stronger reasoning need not improve behavioral sampling. We study this solver-sampler mismatch in three multi-agent negotiation environments: two trading-limits scenarios with different authority structures and a grid-curtailment case […]

May 7, 2026

AI Alignment via Incentives and Correction

arXiv:2605.01643v2 Announce Type: replace-cross Abstract: We study AI alignment through the lens of law-and-economics models of deterrence and enforcement. In these models, misconduct is not treated as an external failure, but as a strategic response to incentives: an actor weighs the gain from violation against the probability of detection and the severity of punishment. We […]

May 7, 2026

Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation

arXiv:2605.05054v1 Announce Type: cross Abstract: Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these […]

May 7, 2026

Subscribe for Updates