arXiv:2603.19715v2 Announce Type: replace Abstract: Formal verification via interactive theorem proving is increasingly used to ensure the correctness of critical systems, yet constructing large proof scripts remains highly manual and limits scalability. Advances in large language models (LLMs), especially in mathematical reasoning, make their integration into software verification increasingly promising. This paper introduces a neuro-symbolic […]
LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance
arXiv:2501.00106v2 Announce Type: replace-cross Abstract: Dataset license compliance is a critical yet complex aspect of developing commercial AI products, particularly with the increasing use of publicly available datasets. Ambiguities in dataset licenses pose significant legal risks, making it challenging even for software IP lawyers to accurately interpret rights and obligations. In this paper, we introduce […]
Mapping Human Anti-collusion Mechanisms to Multi-agent AI Systems
arXiv:2601.00360v2 Announce Type: replace-cross Abstract: As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by […]
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
arXiv:2602.15827v2 Announce Type: replace-cross Abstract: While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. […]
SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification
arXiv:2604.15711v2 Announce Type: replace-cross Abstract: Pathological diagnosis is highly reliant on image analysis, where Regions of Interest (ROIs) serve as the primary basis for diagnostic evidence, while whole-slide image (WSI)-level tasks primarily capture aggregated patterns. To extract these critical morphological features, ROI-level Foundation Models (FMs) based on Vision Transformers (ViTs) and large-scale self-supervised learning (SSL) […]
Benchmarking PNW Model for MedMNIST to 100% Accuracy
arXiv:2604.18916v4 Announce Type: replace Abstract: In this paper, we introduce a new concept called Artificial Special Intelligence by which Machine Learning models for the classification problem can be trained error-free, thus acquiring the capability of not making repeated mistakes. The method is applied to 18 MedMNIST biomedical datasets. Except for three datasets, which suffer from […]
Disentangled Generative Graph Representation Learning
arXiv:2408.13471v2 Announce Type: replace-cross Abstract: Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability. Furthermore, […]
A Survey of Personalized Federated Foundation Models for Privacy-Preserving Recommendation
arXiv:2506.11563v2 Announce Type: replace-cross Abstract: Integrating Foundation Models (FMs) into recommendation systems is an emerging and promising research direction. However, centralized paradigms face growing pressure from privacy concerns and strict regulatory requirements. Federated learning offers a viable solution that enables collaborative model refinement while keeping raw user data on local devices or organizational silos. Yet, […]
Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis
arXiv:2510.16371v3 Announce Type: replace-cross Abstract: Computer-assisted surgery research requires large, deeply annotated video datasets that capture clinical and technical variability. Existing cataract surgery resources lack the diversity and annotation depth required to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoemulsification cataract surgery videos acquired at two surgical centers […]
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
arXiv:2601.20375v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In practice, DP strategies are typically developed through iterative manual analysis and trial-and-error adjustment. These processes inevitably incur high labor […]
Parity, Sensitivity, and Transformers
arXiv:2602.05896v2 Announce Type: replace-cross Abstract: Understanding what neural architectures can and cannot compute is a central challenge in the theory of AI. One of the fundamental problems in this context is the PARITY task, which asks whether the number of 1s in a binary input sequence is even or odd. PARITY is one of the […]
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
arXiv:2603.03331v2 Announce Type: replace-cross Abstract: Photoplethysmography (PPG) is a widely used non-invasive sensing modality for continuous cardiovascular and physiological monitoring across clinical, laboratory, and wearable settings. While existing PPG datasets support a broad range of downstream tasks, they typically provide supervision in the form of numerical measurements or task-specific labels, limiting their compatibility with language-based […]