arXiv:2601.17777v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) is a crucial step for adapting large language models (LLMs) to downstream tasks. However, conflicting objectives across heterogeneous SFT tasks often induce the “seesaw effect”: optimizing for one task may degrade performance on others, particularly when model parameters are updated indiscriminately. In this paper, we propose a […]
From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
arXiv:2510.11056v3 Announce Type: replace-cross Abstract: Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first […]
E2PL: Effective and Efficient Prompt Learning for Incomplete Multi-view Multi-Label Class Incremental Learning
arXiv:2601.17076v1 Announce Type: cross Abstract: Multi-view multi-label classification (MvMLC) is indispensable for modern web applications aggregating information from diverse sources. However, real-world web-scale settings are rife with missing views and continuously emerging classes, which pose significant obstacles to robust learning. Prevailing methods are ill-equipped for this reality, as they either lack adaptability to new classes […]
Unifying VXAI: A Systematic Review and Framework for the Evaluation of Explainable AI
arXiv:2506.15408v2 Announce Type: replace-cross Abstract: Modern AI systems frequently rely on opaque black-box models, most notably Deep Neural Networks, whose performance stems from complex architectures with millions of learned parameters. While powerful, their complexity poses a major challenge to trustworthiness, particularly due to a lack of transparency. Explainable AI (XAI) addresses this issue by providing […]
MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents
arXiv:2601.08235v3 Announce Type: replace Abstract: As language-model agents evolve from passive chatbots into proactive assistants that handle personal data, evaluating their adherence to social norms becomes increasingly critical, often through the lens of Contextual Integrity (CI). However, existing CI benchmarks are largely text-centric and primarily emphasize negative refusal scenarios, overlooking multimodal privacy risks and the […]
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
arXiv:2502.16944v2 Announce Type: replace-cross Abstract: In this paper, we explore how directly pretraining a value model simplifies and stabilizes reinforcement learning from human feedback (RLHF). In reinforcement learning, value estimation is the key to policy optimization, distinct from reward supervision. The value function predicts the emphreturn-to-go of a partial answer, that is, how promising the […]
From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic
arXiv:2601.18702v1 Announce Type: cross Abstract: Current paradigms in Deep Learning prioritize computational throughput over numerical precision, relying on the assumption that intelligence emerges from statistical correlation at scale. In this paper, we challenge this orthodoxy. We propose the Exactness Hypothesis: that General Intelligence (AGI), specifically high-order causal inference, requires a computational substrate capable of Arbitrary […]
HeartLLM: Discretized ECG Tokenization for LLM-Based Diagnostic Reasoning
arXiv:2508.15338v2 Announce Type: replace Abstract: Electrocardiography (ECG) plays a central role in cardiovascular diagnostics, yet existing automated approaches often struggle to generalize across clinical tasks and offer limited support for open-ended reasoning. We present HeartLLM, a novel framework that integrates time-series (TS) and language modeling by enabling large language models (LLMs) to process 12-lead ECG […]
Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
arXiv:2510.08996v4 Announce Type: replace-cross Abstract: Current benchmarks for evaluating software engineering agents, such as SWE-Bench Verified, are predominantly derived from GitHub issues and fail to accurately reflect how developers interact with chat-based coding assistants in integrated development environments (IDEs). We posit that this mismatch leads to a systematic overestimation of agent’s capabilities in real-world scenarios, […]
RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation
arXiv:2601.15124v2 Announce Type: replace-cross Abstract: Graph Foundation Models (GFMs) have emerged as a frontier in graph learning, which are expected to deliver transferable representations across diverse tasks. However, GFMs remain constrained by in-memory bottlenecks: they attempt to encode knowledge into model parameters, which limits semantic capacity, introduces heavy lossy compression with conflicts, and entangles graph […]
Privy: Envisioning and Mitigating Privacy Risks for Consumer-facing AI Product Concepts
arXiv:2509.23525v2 Announce Type: replace-cross Abstract: AI creates and exacerbates privacy risks, yet practitioners lack effective resources to identify and mitigate these risks. We present Privy, a tool that guides practitioners without privacy expertise through structured privacy impact assessments to: (i) identify relevant risks in novel AI product concepts, and (ii) propose appropriate mitigations. Privy was […]
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
arXiv:2508.08487v5 Announce Type: replace-cross Abstract: Despite recent advances, long-sequence video generation frameworks still suffer from significant limitations: poor assistive capability, suboptimal visual quality, and limited expressiveness. To mitigate these limitations, we propose MAViS, a multi-agent collaborative framework designed to assist in long-sequence video storytelling by efficiently translating ideas into visual narratives. MAViS orchestrates specialized agents […]