arXiv:2604.26073v1 Announce Type: cross Abstract: Industrial chemical plants often operate under strict data confidentiality constraints, making centralized data-driven process modeling difficult. Federated learning (FL) provides a promising solution by enabling collaborative model training across distributed facilities without sharing raw operational data. This paper proposes a privacy-preserving federated learning framework for distributed chemical process optimization using […]
Calibrated Surprise: An Information-Theoretic Account of Creative Quality
arXiv:2604.26269v1 Announce Type: cross Abstract: The essence of good creative writing is calibrated surprise: when constraints from all relevant dimensions act together, the feasible solution space collapses into a narrow region, and the surviving choices look least predictable from an unconstrained view. “Calibrated” has a precise meaning: the author’s intent, the reader’s reasonable expectation, and […]
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation
arXiv:2604.25530v2 Announce Type: replace-cross Abstract: Recent knowledge distillation (KD) methods for semantic segmentation introduce increasingly complex hand-crafted objectives, yet are typically evaluated under fixed iteration schedules. These objectives substantially increase per-iteration cost, meaning equal iteration counts do not correspond to equal training budgets. It is therefore unclear whether reported gains reflect stronger distillation signals or […]
Robust Clustering Analysis of Genes Related to Age-related Macular Degeneration using RNA-Seq
arXiv:2604.25986v1 Announce Type: new Abstract: Identifying genes associated with diseases is crucial to understanding disease mechanisms and developing therapies. However, identification of individual genes associated with a disease often needs to be supplemented with clustering analysis to understand the relationships between genes and identify gene modules beyond individual gene-level relationships. Gene co-expression networks are widely […]
A Toolkit for Detecting Spurious Correlations in Speech Datasets
arXiv:2604.26676v1 Announce Type: cross Abstract: We introduce a toolkit for uncovering spurious correlations between recording characteristics and target class in speech datasets. Spurious correlations may arise due to heterogeneous recording conditions, a common scenario for health-related datasets. When present both in the training and test data, these correlations result in an overestimation of the system […]
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
arXiv:2604.26649v1 Announce Type: cross Abstract: Large reasoning models such as DeepSeek-R1 and OpenAI o1 generate extended chains of thought spanning thousands of tokens, yet their integration with retrieval-augmented generation (RAG) remains fundamentally misaligned. Current RAG systems optimize for providing context before reasoning begins, while reasoning models require evidence injection during multi-step inference chains. We introduce […]
Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning
arXiv:2601.13942v2 Announce Type: replace-cross Abstract: Large Multimodal Models (LMMs) have achieved remarkable success in visual understanding, yet they struggle with knowledge-intensive queries involving long-tail entities or evolving information due to static parametric knowledge. Recent search-augmented approaches attempt to address this limitation, but existing methods rely on indiscriminate whole-image retrieval that introduces substantial visual redundancy and […]
Causal Learning with Neural Assemblies
arXiv:2604.26919v1 Announce Type: cross Abstract: Can Neural Assemblies — groups of neurons that fire together and strengthen through co-activation — learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize causal directionality. We demonstrate that the […]
Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods
arXiv:2505.13518v2 Announce Type: replace-cross Abstract: Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a comprehensive, systematic review of data balancing methods, extending beyond foundational oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) […]
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
arXiv:2509.23410v4 Announce Type: replace-cross Abstract: Large language models (LLMs) deliver impressive performance but incur prohibitive memory and compute costs at deployment. Model pruning is an effective way to reduce these overheads, yet existing approaches face challenges: unstructured sparsity, where nonzeros can appear anywhere, preserves accuracy but yields irregular access patterns that prevent GPU acceleration, while […]
Deterministic Legal Agents: A Canonical Primitive API for Auditable Reasoning over Temporal Knowledge Graphs
arXiv:2510.06002v3 Announce Type: replace Abstract: In high-stakes legal domains, retrieval must preserve not only semantic relevance, but also the hierarchy, temporality, and causal provenance of legal norms. Standard Retrieval-Augmented Generation (RAG), based mainly on semantic similarity over text fragments, cannot reliably provide this level of control. Prior work on SAT-Graph RAG addressed the representation problem […]
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities
arXiv:2604.19653v2 Announce Type: replace Abstract: Human mobility data are used in numerous applications, ranging from public health to urban planning. Human mobility is inherently sensitive, as it can contain information such as religious beliefs and political affiliations. Historically, it has been proposed to modify the information using techniques such as aggregation, obfuscation, or noise addition, […]