arXiv:2606.06614v1 Announce Type: cross Abstract: Despite growing interest, most evaluations of large language models’ (LLMs’) personalization abilities have relied on synthetic data. It remains unclear how well current personalization systems work for real users. In this paper, we study the gap in LLM personalization performance when using synthetic versus human data. We collect human conversations […]
CaliPPer: quantifying, predicting and improving AI model performance for binding prediction
arXiv:2606.07258v1 Announce Type: cross Abstract: Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. […]
RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs
arXiv:2604.17948v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize […]
MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion
arXiv:2509.17446v3 Announce Type: replace-cross Abstract: Multimodal intent recognition (MMIR) suffers from weak semantic grounding and poor robustness under noisy or rare-class conditions. We propose MVCL-DAF++, which extends MVCL-DAF with two key modules: (1) Prototype-aware contrastive alignment, aligning instances to class-level prototypes to enhance semantic consistency; and (2) Coarse-to-fine attention fusion, integrating global modality summaries with […]
E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory
arXiv:2601.16622v2 Announce Type: replace-cross Abstract: Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on textitevery edge. To overcome this, we introduce textbfE2Former-V2, a scalable architecture that integrates algebraic […]
TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning
arXiv:2602.18905v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the […]
On the importance of multiple training seeds for evaluating machine unlearning
arXiv:2510.26714v5 Announce Type: replace-cross Abstract: Machine unlearning aims to remove the influence of certain data points from a trained model without costly retraining. Most practical unlearning algorithms are only approximate and their performance can only be assessed empirically. Common practice is to run unlearning algorithms multiple times independently (i.e., using multiple unlearning seeds) starting from […]
UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding
arXiv:2606.07167v1 Announce Type: cross Abstract: Meaningful multilingual evaluation must test models in the target language and educational context. Urdu, spoken by more than 230 million people, lacks a broad MMLU-style benchmark built from native educational sources. We introduce UrduMMLU, a benchmark of 26,431 Urdu MCQs across 26 subjects and five domains, collected from native Urdu […]
MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models
arXiv:2606.04349v2 Announce Type: replace-cross Abstract: Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language Models (OLLMs) due to the extreme distribution heterogeneity and disparate outlier patterns across modalities. To address this, we propose MorphoQuant, a modality-aware PTQ framework engineered to preserve cross-modal morphology and mitigate outlier loss. Specifically, we introduce Distribution-Aware Bias Compensation […]
CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures
arXiv:2606.06646v1 Announce Type: cross Abstract: Formalizing complex reasoning from natural text is one of the central challenges in computational linguistics. It requires systems to understand not just keywords but also the context and complex reasoning embedded in a text. Current Argument Mining (AM) techniques identify basic claims and premises, yet they often struggle to capture […]
NTILC: Neural Tool Invocation via Learned Compression
arXiv:2606.06566v1 Announce Type: cross Abstract: Agentic tool-calling language models depend on large registries of callable APIs, functions, and local actions. Placing full tool specifications directly in the prompt incurs a cost that scales linearly with the size of the tool registry, rapidly consuming the context budget. As the registry grows, this leads to higher latency […]
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
arXiv:2604.17433v2 Announce Type: replace-cross Abstract: Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought […]