October 28, 2025 – Page 8 – DIJEE Pharma Intelligence

LightAgent: Mobile Agentic Foundation Models

arXiv:2510.22009v1 Announce Type: new Abstract: With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable […]

October 28, 2025

A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case

arXiv:2510.21933v1 Announce Type: cross Abstract: The use of Large Language Models (LLMs) to support tasks in software development has steadily increased over recent years. From assisting developers in coding activities to providing conversational agents that answer newcomers’ questions. In collaboration with the Mozilla Foundation, this study evaluates the effectiveness of Retrieval-Augmented Generation (RAG) in assisting […]

October 28, 2025

Antidistillation Sampling

arXiv:2504.13146v5 Announce Type: replace Abstract: Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model’s next-token probability distribution, […]

October 28, 2025

Is Temporal Difference Learning the Gold Standard for Stitching in RL?

arXiv:2510.21995v1 Announce Type: cross Abstract: Reinforcement learning (RL) promises to solve long-horizon tasks even when training data contains only short fragments of the behaviors. This experience stitching capability is often viewed as the purview of temporal difference (TD) methods. However, outside of small tabular settings, trajectories never intersect, calling into question this conventional wisdom. Moreover, […]

October 28, 2025

LLM-AR: LLM-powered Automated Reasoning Framework

arXiv:2510.22034v1 Announce Type: new Abstract: Large language models (LLMs) can already identify patterns and reason effectively, yet their variable accuracy hampers adoption in high-stakes decision-making applications. In this paper, we study this issue from a venture capital perspective by predicting idea-stage startup success based on founder traits. (i) To build a reliable prediction model, we […]

October 28, 2025

Online Optimization for Offline Safe Reinforcement Learning

arXiv:2510.22027v1 Announce Type: cross Abstract: We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online […]

October 28, 2025

Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis

arXiv:2510.00480v2 Announce Type: replace Abstract: Invasion team sports such as soccer produce a high-dimensional, strongly coupled state space as many players continuously interact on a shared field, challenging quantitative tactical analysis. Traditional rule-based analyses are intuitive, while modern predictive machine learning models often perform pattern-matching without explicit agent representations. The problem we address is how […]

October 28, 2025

Agentic Reinforcement Learning for Real-World Code Repair

arXiv:2510.22075v1 Announce Type: cross Abstract: We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across ~1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, […]

October 28, 2025

Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability

arXiv:2510.22039v1 Announce Type: new Abstract: Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent’s adaptability and generalization capacity. Inspired by […]

October 28, 2025

When UAV Swarm Meets IRS: Collaborative Secure Communications in Low-altitude Wireless Networks

arXiv:2510.22117v1 Announce Type: cross Abstract: Low-altitude wireless networks (LAWNs) represent a promising architecture that integrates unmanned aerial vehicles (UAVs) as aerial nodes to provide enhanced coverage, reliability, and throughput for diverse applications. However, these networks face significant security vulnerabilities from both known and potential unknown eavesdroppers, which may threaten data confidentiality and system integrity. To […]

October 28, 2025

Subscribe for Updates