June 2, 2026 – Page 12 – dijee Pharma Intelligence

CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO

arXiv:2606.00172v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR), especially Group Relative Policy Optimization (GRPO), has been widely used to improve reasoning in large language models. However, outcome-level rewards provide only sparse supervision, and group-relative advantages vanish when all sampled trajectories for a prompt are either correct or incorrect. On-Policy Self-Distillation (OPSD) offers […]

June 2, 2026

Improving Visual Representation Alignment Generation with GRPO

arXiv:2606.00583v1 Announce Type: cross Abstract: Recent diffusion transformers have demonstrated strong image synthesis capabilities but remain inefficient to train due to weak alignment between generative and discriminative representations. While representation alignment frameworks such as REPA improve convergence by aligning noisy denoising features with pretrained visual encoders, their externally supervised alignment loss is static and lacks […]

June 2, 2026

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

arXiv:2512.18333v2 Announce Type: replace-cross Abstract: This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors’ RPMs directly, this paper aims to control the quadrotor’s thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor’s z-axis along with the desired […]

June 2, 2026

The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs

arXiv:2606.00674v1 Announce Type: cross Abstract: Large Language Models (LLMs) aligned via outcome-based Reinforcement Learning (RL) frequently exhibit a critical failure mode: they achieve high performance on in-distribution benchmarks while demonstrating brittle reasoning capabilities on out-of-distribution (OOD) tasks. We term this phenomenon Reward-Induced Manifold Collapse. We establish a theoretical framework bridging Structural Causal Models (SCM) and […]

June 2, 2026

Mechanics of Pandemics

arXiv:2606.00192v1 Announce Type: new Abstract: COVID-19 and previous pandemics have shown how diseases can disrupt, threaten, and transform daily life. Since pathogens and societies are continuously evolving, every pandemic is different. However, certain fundamental principles of disease transmission appear to hold true across different outbreaks. These “mechanisms” are grounded in natural laws or the very […]

June 2, 2026

DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models

arXiv:2606.00798v1 Announce Type: cross Abstract: Parameter compression of class-conditional diffusion models reveals an underexplored limitation in output-level distillation: the unconditional score branch remains unsupervised, leaving the classifier-free guidance gap underdetermined in the student. This gap, amplified at every denoising step, admits degenerate solutions where both branches collapse toward identical predictions, rendering guidance ineffective despite low […]

June 2, 2026

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

arXiv:2602.16745v2 Announce Type: replace-cross Abstract: Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency […]

June 2, 2026

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

arXiv:2606.00925v1 Announce Type: cross Abstract: Open agent platforms allow community contributors to publish reusable skills that agents can invoke at runtime. This extensibility also creates a supply-chain risk: malicious contributors can hide harmful behavior inside skills that appear benign under superficial inspection. However, existing defenses are hard to evaluate because there is no benchmark that […]

June 2, 2026

Evolution of cooperation in the multiplex

arXiv:2606.00196v1 Announce Type: new Abstract: Across biological and social systems, cooperation often depends on phenotypic cues rather than random encounters. To account for real-world interactions unfolding across multiple, simultaneous dimensions, here we develop a general framework for the evolution of cooperation in multiplex networks governed by multi-phenotype homophily. We derive analytical conditions for natural selection […]

June 2, 2026

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

arXiv:2604.26283v3 Announce Type: replace-cross Abstract: High-precision medical diagnosis relies not only on static imaging features but also on the implicit diagnostic memory experts instantly invoke during image interpretation. We pinpoint a fundamental cognitive misalignment in medical VLMs caused by discrete tokenization, leading to quantization loss, long-range information dissipation, and missing case-adaptive expertise. To bridge this […]

June 2, 2026

Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context

arXiv:2606.01101v1 Announce Type: cross Abstract: The quadratic complexity of self-attention remains a bottleneck for Large Language Models (LLMs) processing ultra-long contexts. The Naive Bayes Cognitive Engine (NBCE) parallelizes long-context inference by chunking documents and routing to the lowest-entropy chunk at each decoding step. This hard-selection strategy causes semantic fragmentation during cross-chunk reasoning, as abrupt routing […]

June 2, 2026

Consciousness, AI, and the Limits of Scientific Explanation

arXiv:2606.00226v1 Announce Type: new Abstract: Science is constitutively third-personal: its findings are in principle reproducible by any observer, independent of perspective, and answerable to measurement. This is the source of its power and also its limit when it comes to phenomena that are first-personal. While it is obvious that a science of the Meaning of […]

June 2, 2026

Subscribe for Updates