arXiv:2603.23966v1 Announce Type: cross Abstract: With frequently evolving Advanced Persistent Threats (APTs) in cyberspace, traditional security solutions approaches have become inadequate for threat hunting for organizations. Moreover, SOC (Security Operation Centers) analysts are often overwhelmed and struggle to analyze the huge volume of logs received from diverse devices in organizations. To address these challenges, we […]
Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale
arXiv:2603.24023v1 Announce Type: cross Abstract: Applying large, proprietary API-based language models to text-to-SQL tasks poses a significant industry challenge: reliance on massive, schema-heavy prompts results in prohibitive per-token API costs and high latency, hindering scalable production deployment. We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to […]
The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
arXiv:2603.24124v1 Announce Type: cross Abstract: RLHF-aligned language models exhibit response homogenization: on TruthfulQA (n=790), 40-79% of questions produce a single semantic cluster across 10 i.i.d. samples. On affected questions, sampling-based uncertainty methods have zero discriminative power (AUROC=0.500), while free token entropy retains signal (0.603). This alignment tax is task-dependent: on GSM8K (n=500), token entropy achieves […]
Who Benefits from RAG? The Role of Exposure, Utility and Attribution Bias
arXiv:2603.24218v1 Announce Type: cross Abstract: Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have achieved substantial improvements in accuracy by grounding their responses in external documents that are relevant to the user’s query. However, relatively little work has investigated the impact of RAG in terms of fairness. Particularly, it is not yet known if […]
Toward Generalist Neural Motion Planners for Robotic Manipulators: Challenges and Opportunities
arXiv:2603.24318v1 Announce Type: cross Abstract: State-of-the-art generalist manipulation policies have enabled the deployment of robotic manipulators in unstructured human environments. However, these frameworks struggle in cluttered environments primarily because they utilize auxiliary modules for low-level motion planning and control. Motion planning remains challenging due to the high dimensionality of the robot’s configuration space and the […]
Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots
arXiv:2603.23682v1 Announce Type: cross Abstract: The rapid adoption of large language models (LLMs) in education raises profound challenges for assessment design. To adapt assessments to the presence of LLM-based tools, it is crucial to characterize the strengths and weaknesses of LLMs in a generalizable, valid and reliable manner. However, current LLM evaluations often rely on […]
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
arXiv:2512.16917v3 Announce Type: replace Abstract: Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plausible but invalid steps. In this paper, we introduce Generative Adversarial Reasoner, an on-policy joint training framework designed to enhance reasoning by co-evolving an LLM […]
When AI output tips to bad but nobody notices: Legal implications of AI’s mistakes
arXiv:2603.23857v1 Announce Type: new Abstract: The adoption of generative AI across commercial and legal professions offers dramatic efficiency gains — yet for law in particular, it introduces a perilous failure mode in which the AI fabricates fictitious case law, statutes, and judicial holdings that appear entirely authentic. Attorneys who unknowingly file such fabrications face professional […]
A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data and LLMs Perspective
arXiv:2211.14997v5 Announce Type: replace-cross Abstract: Enterprise financial risk analysis aims at predicting the future financial risk of enterprises. Due to its wide and significant application, enterprise financial risk analysis has always been the core research topic in the fields of Finance and Management. Based on advanced computer science and artificial intelligence technologies, enterprise risk analysis […]
The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search
arXiv:2603.23873v1 Announce Type: new Abstract: DeepXube is a free and open-source Python package and command-line tool that seeks to automate the solution of pathfinding problems by using machine learning to learn heuristic functions that guide heuristic search algorithms tailored to deep neural networks (DNNs). DeepXube is comprised of the latest advances in deep reinforcement learning, […]
Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting
arXiv:2505.20714v3 Announce Type: replace-cross Abstract: Indoor environments typically contain diverse RF signals distributed across multiple frequency bands, including NB-IoT, Wi-Fi, and millimeter-wave. Consequently, wideband RF modeling is essential for practical applications such as joint deployment of heterogeneous RF systems, cross-band communication, and distributed RF sensing. Although 3D Gaussian Splatting (3DGS) techniques effectively reconstruct RF radiance […]
DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction
arXiv:2603.23909v1 Announce Type: new Abstract: While Large Language Models (LLMs) provide semantic flexibility for robotic task planning, their susceptibility to hallucination and logical inconsistency limits their reliability in long-horizon domains. To bridge the gap between unstructured environments and rigorous plan synthesis, we propose DUPLEX, an agentic dual-system neuro-symbolic architecture that strictly confines the LLM to […]