arXiv:2604.05591v1 Announce Type: cross Abstract: This work introduces a modular platform that brings together six AI services, automatic speech recognition via OpenAI Whisper, multilingual translation through Meta NLLB, speech synthesis using AWS Polly, emotion classification with RoBERTa, dialogue summarisation via flan t5 base samsum, and International Sign (IS) rendering through Google MediaPipe. A corpus of […]
TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models
arXiv:2604.04942v1 Announce Type: cross Abstract: Enhancing the reasoning capability of large language models (LLMs) remains a core challenge in natural language processing. The Chain-of-Thought (CoT) paradigm dominates practical applications for its single-round efficiency, yet its reasoning chains often exhibit logical gaps. While multi-round paradigms like Graph-of-Thoughts (GoT), Tree-of-Thoughts (ToT), and Atom of Thought (AoT) achieve […]
Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning
arXiv:2604.05297v1 Announce Type: new Abstract: Value factorization, a popular paradigm in MARL, faces significant theoretical and algorithmic bottlenecks: its tendency to converge to suboptimal solutions remains poorly understood and unsolved. Theoretically, existing analyses fail to explain this due to their primary focus on the optimal case. To bridge this gap, we introduce a novel theoretical […]
Rectified Schr”odinger Bridge Matching for Few-Step Visual Navigation
arXiv:2604.05673v1 Announce Type: cross Abstract: Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schr”odinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing […]
From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
arXiv:2604.04948v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated PDF processing frameworks by their impact on downstream question-answering accuracy. We address this gap through a systematic comparison of four open-source PDF-to-Markdown conversion frameworks, Docling, MinerU, Marker, and DeepSeek OCR, across 19 […]
Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud
arXiv:2604.04951v1 Announce Type: cross Abstract: Imagine receiving a video call from your CFO, surrounded by colleagues, asking you to urgently authorise a confidential transfer. You comply. Every person on that call was fake, and you just lost $25 million. This is not a hypothetical. It happened in Hong Kong in January 2024, and it is […]
PhageBench: Can LLMs Understand Raw Bacteriophage Genomes?
arXiv:2604.05775v1 Announce Type: cross Abstract: Bacteriophages, often referred to as the dark matter of the biosphere, play a critical role in regulating microbial ecosystems and in antibiotic alternatives. Thus, accurate interpretation of their genomes holds significant scientific and practical value. While general-purpose Large Language Models (LLMs) excel at understanding biological texts, their ability to directly […]
The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown
arXiv:2604.04956v1 Announce Type: cross Abstract: The recent, super-exponential scaling of autonomous Large Language Model (LLM) agents signals a broader, fundamental paradigm shift from machines replacing the human hands (manual labor and mechanical processing) to machines delegating for the human minds (thinking, reasoning, and intention). This uncontrolled offloading and scaling of “thinking” itself has profound consequences […]
TRACE: Capability-Targeted Agentic Training
arXiv:2604.05336v1 Announce Type: new Abstract: Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances, where a capability is performing one or more actions in a trajectory that are necessary for successfully solving a subset of tasks in the environment. Many existing approaches either rely on synthetic training data […]
Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code’s Auto Mode
arXiv:2604.04978v1 Announce Type: cross Abstract: Claude Code’s auto mode is the first deployed permission system for AI coding agents, using a two-stage transcript classifier to gate dangerous tool calls. Anthropic reports a 0.4% false positive rate and 17% false negative rate on production traffic. We present the first independent evaluation of this system on deliberately […]
Automatic dental superimposition of 3D intraorals and 2D photographs for human identification
arXiv:2604.05877v1 Announce Type: cross Abstract: Dental comparison is considered a primary identification method, at the level of fingerprints and DNA profiling. One crucial but time-consuming step of this method is the morphological comparison. One of the main challenges to apply this method is the lack of ante-mortem medical records, specially on scenarios such as migrant […]
CURE:Circuit-Aware Unlearning for LLM-based Recommendation
arXiv:2604.04982v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have opened new opportunities for recommender systems by enabling rich semantic understanding and reasoning about user interests and item attributes. However, as privacy regulations tighten, incorporating user data into LLM-based recommendation (LLMRec) introduces significant privacy risks, making unlearning algorithms increasingly crucial for practical […]