arXiv:2605.00119v1 Announce Type: cross Abstract: There is a significant gap in evaluating cultural reasoning in LLMs using conversational datasets that capture culturally rich and dialectal contexts. Most Arabic benchmarks focus on short text snippets in Modern Standard Arabic (MSA), overlooking the cultural nuances that naturally arise in dialogues. To address this gap, we introduce ArabCulture-Dialogue, […]
ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts
arXiv:2605.00116v1 Announce Type: cross Abstract: In this article, we introduce ViLegalNLI, the first large-scale Vietnamese Natural Language Inference (NLI) dataset specifically constructed for the legal domain. The dataset consists of 42,012 premise-hypothesis pairs derived from official statutory documents and annotated with binary inference labels (Entailment and Non-entailment). It covers multiple legal domains and reflects realistic […]
Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech
arXiv:2603.15988v2 Announce Type: replace-cross Abstract: Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech […]
Vibe Coding in Product Teams: Reconfiguring AI-Assisted Workflows, Prototyping, and Collaboration
arXiv:2509.10652v3 Announce Type: replace-cross Abstract: Generative AI is reshaping product design practices through “vibe coding,” where product team members express intent in natural language and AI translates it into functional prototypes and code. Despite rapid adoption, little research has examined how vibe coding reconfigures product development workflows and collaboration. Drawing on interviews with 22 product […]
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
arXiv:2509.26388v4 Announce Type: replace-cross Abstract: Conversational Spoken Language Models (SLMs) are emerging as a promising paradigm for real-time speech interaction. However, their capacity of temporal dynamics, including the ability to manage timing, tempo and simultaneous speaking, remains a critical and unevaluated challenge for conversational fluency. To address this gap, we introduce the Game-Time Benchmark, a […]
Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
arXiv:2512.20260v5 Announce Type: replace-cross Abstract: Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision such as scribble annotations. Despite recent progress, existing WSCOD methods still lag far behind fully supervised ones due to two major limitations: (1) the pseudo masks […]
Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation
arXiv:2510.19897v2 Announce Type: replace-cross Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages LLM-generated critiques grounded in labeled data. Our framework uses episodic […]
TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation
arXiv:2512.04694v3 Announce Type: replace-cross Abstract: Effective earthquake risk reduction relies on accurate site-specific evaluations, which require models capable of representing the influence of local site conditions on ground motion characteristics. We address strong ground motion generation from time-domain accelerometer records and introduce the TimesNet-Gen, a deep generative framework. In this framework, site-specific generation is directly […]
VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image
arXiv:2602.04349v2 Announce Type: replace-cross Abstract: 3D editing has emerged as a critical research area to provide users with flexible control over 3D assets. While current editing approaches predominantly focus on 3D Gaussian Splatting or multi-view images, the direct editing of 3D meshes remains underexplored. Prior attempts, such as VoxHammer, rely on voxel-based representations that suffer […]
Smart Profit-Aware Crop Advisory System: Kisan AI
arXiv:2605.00133v1 Announce Type: cross Abstract: Modern crop advisory systems exhibit a critical limitation termed textiteconomic blindness. These systems primarily optimize for biological yield, often overlooking market price, which can lead farmers toward agronomically sound yet financially unviable decisions. In this paper, we develop Kisan AI, a smart profit-aware crop advisory system that resolves the above-mentioned […]
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
arXiv:2605.00733v1 Announce Type: cross Abstract: Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning approaches neither sever the cross-modal reconstruction channel mediated by bilinear coupling nor separate forget-exclusive […]
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
arXiv:2604.28139v2 Announce Type: replace-cross Abstract: LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a […]