arXiv:2502.05568v3 Announce Type: replace-cross Abstract: In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how […]
CP Loss: Channel-wise Perceptual Loss for Time Series Forecasting
arXiv:2601.18829v1 Announce Type: cross Abstract: Multi-channel time-series data, prevalent across diverse applications, is characterized by significant heterogeneity in its different channels. However, existing forecasting models are typically guided by channel-agnostic loss functions like MSE, which apply a uniform metric across all channels. This often leads to fail to capture channel-specific dynamics such as sharp fluctuations […]
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
arXiv:2505.14140v3 Announce Type: replace Abstract: Despite rapid advancements in large language models (LLMs), the token-level autoregressive nature constrains their complex reasoning capabilities. To enhance LLM reasoning, inference-time techniques, including Chain/Tree/Graph-of-Thought(s), successfully improve the performance, as they are fairly cost-effective by guiding reasoning through sophisticated logical structures without modifying LLMs’ parameters. However, these manually predefined, task-agnostic […]
Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry
arXiv:2601.18844v1 Announce Type: cross Abstract: Static analysis tools (SATs) are widely adopted in both academia and industry for improving software quality, yet their practical use is often hindered by high false positive rates, especially in large-scale enterprise systems. These false alarms demand substantial manual inspection, creating severe inefficiencies in industrial code review. While recent work […]
Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
arXiv:2510.02091v4 Announce Type: replace Abstract: Recent studies suggest that the deeper layers of Large Language Models (LLMs) contribute little to representation learning and can often be removed without significant performance loss. However, such claims are typically drawn from narrow evaluations and may overlook important aspects of model behavior. In this work, we present a systematic […]
SICL-AT: Another way to adapt Auditory LLM to low-resource task
arXiv:2601.18904v1 Announce Type: cross Abstract: Auditory Large Language Models (LLMs) have demonstrated strong performance across a wide range of speech and audio understanding tasks. Nevertheless, they often struggle when applied to low-resource or unfamiliar tasks. In case of labeled in-domain data is scarce or mismatched to the true test distribution, direct fine-tuning can be brittle. […]
Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing
arXiv:2511.12529v2 Announce Type: replace-cross Abstract: Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing – an endeavor requiring precision, multimodal synthesis, and domain expertise – remains insufficiently understood. We examine the potential of LLMs to support domain experts in scientific writing, with a focus on abstract […]
Configurable p-Neurons Using Modular p-Bits
arXiv:2601.18943v1 Announce Type: cross Abstract: Probabilistic bits (p-bits) have recently been employed in neural networks (NNs) as stochastic neurons with sigmoidal probabilistic activation functions. Nonetheless, there remain a wealth of other probabilistic activation functions that are yet to be explored. Here we re-engineer the p-bit by decoupling its stochastic signal path from its input data […]
daVinci-Dev: Agent-native Mid-training for Software Engineering
arXiv:2601.18418v2 Announce Type: replace-cross Abstract: Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic […]
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
arXiv:2601.18973v1 Announce Type: cross Abstract: Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales […]
Schema-based active inference supports rapid generalization of experience and frontal cortical coding of abstract structure
arXiv:2601.18946v1 Announce Type: new Abstract: Schemas — abstract relational structures that capture the commonalities across experiences — are thought to underlie humans’ and animals’ ability to rapidly generalize knowledge, rebind new experiences to existing structures, and flexibly adapt behavior across contexts. Despite their central role in cognition, the computational principles and neural mechanisms supporting schema […]
Learning Adaptive Parallel Execution for Efficient Code Localization
arXiv:2601.19568v1 Announce Type: new Abstract: Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9% redundant invocation rate, which negates parallelism benefits. We propose textbfFuseSearch, reformulating parallel code localization as a textbfjoint quality-efficiency optimization task. Through defining textbftool efficiency — the […]