arXiv:2604.18873v1 Announce Type: new Abstract: Large language models (LLMs) are highly capable at language generation, but they remain unreliable when reasoning requires explicit symbolic structure, multi-step inference, and interpretable uncertainty. This paper presents a neuro-symbolic framework for translating natural-language reasoning problems into executable formal representations using first-order logic (FOL) and Narsese, the language of the […]
Gated Memory Policy
arXiv:2604.18933v1 Announce Type: cross Abstract: Robotic manipulation tasks exhibit varying memory requirements, ranging from Markovian tasks that require no memory to non-Markovian tasks that depend on historical information spanning single or multiple interaction trials. Surprisingly, simply extending observation histories of a visuomotor policy often leads to a significant performance drop due to distribution shift and […]
Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
arXiv:2403.10559v4 Announce Type: replace-cross Abstract: This report investigates the history and impact of Generative Models and Connected and Automated Vehicles (CAVs), two groundbreaking forces pushing progress in technology and transportation. By focusing on the application of generative models within the context of CAVs, the study aims to unravel how this integration could enhance predictive modeling, […]
Self-Improving Tabular Language Models via Iterative Group Alignment
arXiv:2604.18966v1 Announce Type: cross Abstract: While language models have been adapted for tabular data generation, two fundamental limitations remain: (1) static fine-tuning produces models that cannot learn from their own generated samples and adapt to self-correct, and (2) autoregressive objectives preserve local token coherence but neglect global statistical properties, degrading tabular quality. Reinforcement learning offers […]
How Adversarial Environments Mislead Agentic AI?
arXiv:2604.18874v1 Announce Type: new Abstract: Tool-integrated agents are deployed on the premise that external tools ground their outputs in reality. Yet this very reliance creates a critical attack surface. Current evaluations benchmark capability in benign settings, asking “can the agent use tools correctly” but never “what if the tools lie”. We identify this Trust Gap: […]
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees
arXiv:2604.19000v1 Announce Type: cross Abstract: Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the […]
Benchmarking Misuse Mitigation Against Covert Adversaries
arXiv:2506.06414v2 Announce Type: replace-cross Abstract: Existing language model safety evaluations focus on overt attacks and low-stakes tasks. In reality, an attacker can easily subvert existing safeguards by requesting help on small, benign-seeming tasks across many independent queries. Because the individual queries do not appear harmful, the attack is hard to detect. However, when combined, these […]
RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora
arXiv:2604.19047v1 Announce Type: cross Abstract: Existing QA benchmarks typically assume distinct documents with minimal overlap, yet real-world retrieval-augmented generation (RAG) systems operate on corpora such as financial reports, legal codes, and patents, where information is highly redundant and documents exhibit strong inter-document similarity. This mismatch undermines evaluation validity: retrievers can be unfairly undervalued even when […]
Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline
arXiv:2604.18882v1 Announce Type: new Abstract: We present a formally verified framework for patent analysis as a hybrid AI + Lean 4 pipeline. The DAG-coverage core (Algorithm 1b) is fully machine-verified once bounded match scores are fixed. Freedom-to-operate, claim-construction sensitivity, cross-claim consistency, and doctrine-of-equivalents analyses are formalized at the specification level with kernel-checked candidate certificates. Existing […]
S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection
arXiv:2604.19072v1 Announce Type: cross Abstract: Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by […]
Towards Generalization of Graph Neural Networks for AC Optimal Power Flow
arXiv:2510.06860v2 Announce Type: replace-cross Abstract: AC Optimal Power Flow (ACOPF) is computationally intensive for large-scale grids, often requiring prohibitive solution times with conventional solvers. Machine learning offers significant speedups, but existing models struggle with scalability and topology flexibility. To address these challenges, we propose a Hybrid Heterogeneous Message Passing Neural Network (HH-MPNN) that integrates a […]
Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration
arXiv:2604.19093v1 Announce Type: cross Abstract: Multi-modal test-time adaptation (TTA) enhances the resilience of benchmark multi-modal models against distribution shifts by leveraging the unlabeled target data during inference. Despite the documented success, the advancement of multi-modal TTA methodologies has been impeded by a persistent limitation, i.e., the lack of explicit modeling of category-conditional distributions, which is […]