arXiv:2511.00230v1 Announce Type: cross Abstract: Millions of users now design personalized LLM-based chatbots that shape their daily interactions, yet they can only loosely anticipate how their design choices will manifest as behaviors in deployment. This opacity is consequential: seemingly innocuous prompts can trigger excessive sycophancy, toxicity, or inconsistency, degrading utility and raising safety concerns. To […]
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
arXiv:2511.00361v1 Announce Type: cross Abstract: High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-VAE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and utility metrics, MalDataGen outperforms benchmarks like SDV while preserving data utility. Its […]
PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise
arXiv:2511.01359v1 Announce Type: cross Abstract: Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model output is entailed from the supposed evidence, triggering some corrective actions, such as beam reranking at inference time […]
QuantumBench: A Benchmark for Quantum Problem Solving
arXiv:2511.00092v1 Announce Type: new Abstract: Large language models are now integrated into many scientific workflows, accelerating data analysis, hypothesis generation, and design space exploration. In parallel with this growth, there is a growing need to carefully evaluate whether models accurately capture domain-specific knowledge and notation, since general-purpose benchmarks rarely reflect these requirements. This gap is […]
Survey Transfer Learning: Recycling Data with Silicon Responses
arXiv:2501.06577v2 Announce Type: replace Abstract: As researchers increasingly turn to large language models (LLMs) to generate synthetic survey data, less attention has been paid to alternative AI paradigms given environmental costs of LLMs. This paper introduces Survey Transfer Learning (STL), which develops transfer learning paradigms from computer science for survey research to recycle existing survey […]
GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow
arXiv:2511.00119v1 Announce Type: new Abstract: Spatial transcriptomics (ST) technologies can be used to align transcriptomes with histopathological morphology, presenting exciting new opportunities for biomolecular discovery. Using ST data, we construct a novel framework, GeneFlow, to map transcriptomics onto paired cellular images. By combining an attention-based RNA encoder with a conditional UNet guided by rectified flow, […]
A Proof of Learning Rate Transfer under $mu$P
arXiv:2511.01734v1 Announce Type: cross Abstract: We provide the first proof of learning rate transfer with width in a linear multi-layer perceptron (MLP) parametrized with $mu$P, a neural network parameterization designed to “maximize” feature learning in the infinite-width limit. We show that under $mu P$, the optimal learning rate converges to a emphnon-zero constant as width […]
DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding
arXiv:2408.12150v2 Announce Type: replace-cross Abstract: Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on […]
GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare
arXiv:2510.08872v3 Announce Type: replace Abstract: Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: […]
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
arXiv:2511.01144v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence (CTI) remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could substantially reduce analyst workload. CTIBench introduced a comprehensive benchmark for evaluating […]