arXiv:2510.25947v1 Announce Type: cross Abstract: The impact of different multilingual data mixtures in pretraining large language models (LLMs) has been a topic of ongoing debate, often raising concerns about potential trade-offs between language coverage and model performance (i.e., the curse of multilinguality). In this work, we investigate these assumptions by training 1.1B and 3B parameter […]
FinOps Agent — A Use-Case for IT Infrastructure and Cost Optimization
arXiv:2510.25914v1 Announce Type: new Abstract: FinOps (Finance + Operations) represents an operational framework and cultural practice which maximizes cloud business value through collaborative financial accountability across engineering, finance, and business teams. FinOps practitioners face a fundamental challenge: billing data arrives in heterogeneous formats, taxonomies, and metrics from multiple cloud providers and internal systems which eventually […]
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
arXiv:2510.25992v1 Announce Type: cross Abstract: Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails when correct solutions are rarely sampled even after many attempts, while Supervised Fine-Tuning (SFT) tends to overfit long demonstrations through rigid token-by-token imitation. To address this gap, […]
Modeling Neural Activity with Conditionally Linear Dynamical Systems
arXiv:2502.18347v2 Announce Type: replace Abstract: Neural population activity exhibits complex, nonlinear dynamics, varying in time, over trials, and across experimental conditions. Here, we develop Conditionally Linear Dynamical System (CLDS) models as a general-purpose method to characterize these dynamics. These models use Gaussian Process (GP) priors to capture the nonlinear dependence of circuit dynamics on task […]
Climate Adaptation-Aware Flood Prediction for Coastal Cities Using Deep Learning
arXiv:2510.26017v1 Announce Type: cross Abstract: Climate change and sea-level rise (SLR) pose escalating threats to coastal cities, intensifying the need for efficient and accurate methods to predict potential flood hazards. Traditional physics-based hydrodynamic simulators, although precise, are computationally expensive and impractical for city-scale coastal planning applications. Deep Learning (DL) techniques offer promising alternatives, however, they […]
Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning
arXiv:2510.25933v1 Announce Type: new Abstract: We introduce Humans-Junior, a 3.8B model that matches GPT-4o on the FACTS Grounding public subset within a $pm 5$ pp equivalence margin. Results. On Q1–Q500 under identical judges, GPT-4o scores 73.5% (95% CI 69.5–77.2) and Humans-Junior 72.7% (95% CI 68.7–76.5); the paired difference is 0.8 pp (bootstrap 95% CI $-3.1$ […]
Artificial Intelligence-Enabled Analysis of Radiology Reports: Epidemiology and Consequences of Incidental Thyroid Findings
arXiv:2510.26032v1 Announce Type: cross Abstract: Importance Incidental thyroid findings (ITFs) are increasingly detected on imaging performed for non-thyroid indications. Their prevalence, features, and clinical consequences remain undefined. Objective To develop, validate, and deploy a natural language processing (NLP) pipeline to identify ITFs in radiology reports and assess their prevalence, features, and clinical outcomes. Design, Setting, […]
Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
arXiv:2510.02091v2 Announce Type: replace Abstract: Recent studies suggest that the deeper layers of Large Language Models (LLMs) contribute little to representation learning and can often be removed without significant performance loss. However, such claims are typically drawn from narrow evaluations and may overlook important aspects of model behavior. In this work, we present a systematic […]
Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems
arXiv:2510.26061v1 Announce Type: cross Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling us to produce high-quality solutions even for previously unseen problems. […]
InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics
arXiv:2510.25943v1 Announce Type: new Abstract: In control problems and basic scientific modeling, it is important to compare observations with dynamical simulations. For example, comparing two neural systems can shed light on the nature of emergent computations in the brain and deep neural networks. Recently, Ostrow et al. (2023) introduced Dynamical Similarity Analysis (DSA), a method […]