arXiv:2601.20041v1 Announce Type: cross Abstract: Personalized virtual assistants powered by large language models (LLMs) on edge devices are attracting growing attention, with Retrieval-Augmented Generation (RAG) emerging as a key method for personalization by retrieving relevant profile data and generating tailored responses. However, deploying RAG on edge devices faces efficiency hurdles due to the rapid growth […]
How Much Progress Has There Been in NVIDIA Datacenter GPUs?
arXiv:2601.20115v1 Announce Type: cross Abstract: Graphics Processing Units (GPUs) are the state-of-the-art architecture for essential tasks, ranging from rendering 2D/3D graphics to accelerating workloads in supercomputing centers and, of course, Artificial Intelligence (AI). As GPUs continue improving to satisfy ever-increasing performance demands, analyzing past and current progress becomes paramount in determining future constraints on scientific […]
BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification
arXiv:2601.20129v1 Announce Type: cross Abstract: Sentiment analysis for the Bengali language has attracted increasing research interest in recent years. However, progress remains constrained by the scarcity of large-scale and diverse annotated datasets. Although several Bengali sentiment and hate speech datasets are publicly available, most are limited in size or confined to a single domain, such […]
What’s the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
arXiv:2601.20164v1 Announce Type: cross Abstract: Prior work suggests that language models, while trained on next token prediction, show implicit planning behavior: they may select the next token in preparation to a predicted future token, such as a likely rhyming word, as supported by a prior qualitative study of Claude 3.5 Haiku using a cross-layer transcoder. […]
ProFlow: Zero-Shot Physics-Consistent Sampling via Proximal Flow Guidance
arXiv:2601.20227v1 Announce Type: cross Abstract: Inferring physical fields from sparse observations while strictly satisfying partial differential equations (PDEs) is a fundamental challenge in computational physics. Recently, deep generative models offer powerful data-driven priors for such inverse problems, yet existing methods struggle to enforce hard physical constraints without costly retraining or disrupting the learned generative prior. […]
How AI Impacts Skill Formation
arXiv:2601.20245v1 Announce Type: cross Abstract: AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We […]
Eliciting Least-to-Most Reasoning for Phishing URL Detection
arXiv:2601.20270v1 Announce Type: cross Abstract: Phishing continues to be one of the most prevalent attack vectors, making accurate classification of phishing URLs essential. Recently, large language models (LLMs) have demonstrated promising results in phishing URL detection. However, their reasoning capabilities that enabled such performance remain underexplored. To this end, in this paper, we propose a […]
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
arXiv:2601.20299v1 Announce Type: cross Abstract: The evaluation and post-training of large language models (LLMs) rely on supervision, but strong supervision for difficult tasks is often unavailable, especially when evaluating frontier models. In such cases, models are demonstrated to exploit evaluations built on such imperfect supervision, leading to deceptive results. However, underutilized in LLM research, a […]
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
arXiv:2601.20309v1 Announce Type: cross Abstract: Large Language Model (LLM) serving faces a fundamental tension between stringent latency Service Level Objectives (SLOs) and limited GPU memory capacity. When high request rates exhaust the KV cache budget, existing LLM inference systems often suffer severe head-of-line (HOL) blocking. While prior work explored PCIe-based offloading, these approaches cannot sustain […]
CEI: A Clonal Expansion Identifier for T-cell receptor clones following SARS-CoV-2 vaccination
arXiv:2601.20343v1 Announce Type: new Abstract: Each T cell typically carries a specific T-cell receptor (TCR) that determines its specificity against an epitope presented by the HLA complex on a target cell. Antigenic challenge triggers the expansion of reactive cells within a diverse pool of T cells with randomly generated receptors, a process that results in […]
IoT Device Identification with Machine Learning: Common Pitfalls and Best Practices
arXiv:2601.20548v1 Announce Type: cross Abstract: This paper critically examines the device identification process using machine learning, addressing common pitfalls in existing literature. We analyze the trade-offs between identification methods (unique vs. class based), data heterogeneity, feature extraction challenges, and evaluation metrics. By highlighting specific errors, such as improper data augmentation and misleading session identifiers, we […]
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
arXiv:2601.20125v1 Announce Type: cross Abstract: Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models’ single fixed prediction […]