arXiv:2601.19903v1 Announce Type: cross Abstract: Formal Verification (FV) relies on high-quality SystemVerilog Assertions (SVAs), but the manual writing process is slow and error-prone. Existing LLM-based approaches either generate assertions from scratch or ignore structural patterns in hardware designs and expert-crafted assertions. This paper presents STELLAR, the first framework that guides LLM-based SVA generation with structural […]
From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text
arXiv:2601.19913v1 Announce Type: cross Abstract: Distinguishing human-written Korean text from fluent LLM outputs remains difficult even for linguistically trained readers, who can over-trust surface well-formedness. We study whether expert detection can be treated as a learnable skill and improved through structured calibration. We introduce LREAD, a rubric derived from national Korean writing standards and adapted […]
Demystifying Multi-Agent Debate: The Role of Confidence and Diversity
arXiv:2601.19921v1 Announce Type: cross Abstract: Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies show that, under homogeneous agents and uniform belief updates, debate preserves expected correctness and therefore cannot reliably […]
Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study
arXiv:2601.19925v1 Announce Type: cross Abstract: Introduction: Large language models (LLMs) can process requests and generate texts, but their feasibility for assessing complex academic content needs further investigation. To explore LLM’s potential in assisting scientific review, this study examined ChatGPT-5, Gemini-3-Pro, and Claude-Sonnet-4.5’s consistency and reliability in evaluating abstracts compared to one another and to human […]
Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle
arXiv:2601.19933v1 Announce Type: cross Abstract: Non-Resolution Reasoning (NRR) provides a formal framework for maintaining semantic ambiguity rather than forcing premature interpretation collapse. While the foundational architecture establishes state spaces and operators for ambiguity-preserving computation, the critical question of how natural language maps to these mathematical structures remains open. This paper introduces the text-to-state mapping function […]
DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information
arXiv:2601.19938v1 Announce Type: cross Abstract: Decentralized Federated Learning (DFL) is a serverless collaborative machine learning paradigm where devices collaborate directly with neighbouring devices to exchange model information for learning a generalized model. However, variations in individual experiences and different levels of device interactions lead to data and model initialization heterogeneities across devices. Such heterogeneities leave […]
MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents
arXiv:2601.20831v1 Announce Type: new Abstract: Foundation models rely on in-context learning for personalized decision making. The limited size of this context window necessitates memory compression and retrieval systems like RAG. These systems however often treat memory as large offline storage spaces, which is unfavorable for embodied agents that are expected to operate under strict memory […]
SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models
arXiv:2601.20856v1 Announce Type: new Abstract: Although the capabilities of large language models have been increasingly tested on complex reasoning tasks, their long-horizon planning abilities have not yet been extensively investigated. In this work, we provide a systematic assessment of the planning and long-horizon reasoning capabilities of state-of-the-art Large Reasoning Models (LRMs). We propose a novel […]
DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs
arXiv:2601.19904v1 Announce Type: cross Abstract: The exponential growth of large language models has outpaced the capabilities of traditional CPU and GPU architectures due to the slowdown of Moore’s Law. Dataflow AI accelerators present a promising alternative; however, there remains a lack of in-depth performance analysis and standardized benchmarking methodologies for LLM training. We introduce DABench-LLM, […]
Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study
arXiv:2601.19912v1 Announce Type: cross Abstract: Large language models (LLMs) are highly compute- and memory-intensive, posing significant demands on high-performance GPUs. At the same time, advances in GPU technology driven by shrinking transistor sizes and lower operating voltages have made these devices increasingly susceptible to soft errors. While prior work has examined GPU reliability, most studies […]
Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments
arXiv:2601.19914v1 Announce Type: cross Abstract: Synthetic data has proven itself to be a valuable resource for tuning smaller, cost-effective language models to handle the complexities of multi-turn tool calling conversations. While many frameworks and systems for producing synthetic multi-turn tool calling data have been proposed, prior works have frequently assumed that any tool calling interactions […]
FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition
arXiv:2601.19919v1 Announce Type: cross Abstract: Knowledge distillation is one of the most effective methods for model compression. Previous studies have focused on the student model effectively training the predictive distribution of the teacher model. However, during training, the student model may inherit the shortcomings of the teacher model, which can lead to a decline in […]