arXiv:2604.26700v1 Announce Type: new Abstract: Boolean networks are powerful mathematical tools for modeling the qualitative dynamics of genetic regulation. Yet inferred models often generate spurious attractors that lack biological viability. In this paper, we propose a parsimonious computational framework to systematically refine Boolean network models by eliminating these non-biological asymptotic behaviors while strictly preserving known, […]
Learning to Ask: When LLM Agents Meet Unclear Instruction
arXiv:2409.00557v4 Announce Type: replace-cross Abstract: Equipped with the capability to call functions, modern large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. However, the effective execution of these tools relies heavily not just on the advanced capabilities of LLMs but also on precise user instructions, […]
FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards
arXiv:2604.26733v1 Announce Type: new Abstract: Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often driven progress in agents, […]
OT Score: An OT based Confidence Score for Prototype-Assisted Source Free Unsupervised Domain Adaptation
arXiv:2505.11669v3 Announce Type: replace-cross Abstract: We address the computational and theoretical limitations of current distributional alignment methods for source-free unsupervised domain adaptation (SFUDA) using source class-mean features. In particular, we focus on estimating classification performance and confidence in the absence of target labels. Current theoretical frameworks for these methods often yield computationally intractable quantities and […]
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
arXiv:2604.26805v1 Announce Type: new Abstract: Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant […]
Risk Reporting for Developers’ Internal AI Model Use
arXiv:2604.24966v1 Announce Type: cross Abstract: Frontier AI companies first deploy their most advanced models internally, for weeks or months of safety testing, evaluation, and iteration, before a possible public release. For example, Anthropic recently developed a new class of model with advanced cyberoffense-relevant capabilities, Mythos Preview, which was available internally for at least six weeks […]
Analysing Lightweight Large Language Models for Biomedical Named Entity Recognition on Diverse Ouput Formats
arXiv:2604.25920v1 Announce Type: cross Abstract: Despite their strong linguistic capabilities, Large Language Models (LLMs) are computationally demanding and require substantial resources for fine-tuning, which is unadapted to privacy and budget constraints of many healthcare settings. To address this, we present an experimental analysis focused on Biomedical Named Entity Recognition using lightweight LLMs, we evaluate the […]
FedPF: Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility
arXiv:2510.26841v2 Announce Type: replace-cross Abstract: Federated Learning (FL) enables collaborative model training without data sharing, yet participants face a fundamental challenge, e.g., simultaneously ensuring fairness across demographic groups while protecting sensitive client data. We introduce a differentially private fair FL algorithm (FedPF) that transforms this multi-objective optimization into a zero-sum game where fairness and privacy […]
Retrieval-Augmented LLMs for Evidence Localization in Clinical Trial Recruitment from Longitudinal EHR Narratives
arXiv:2604.05190v2 Announce Type: replace-cross Abstract: Screening patients for enrollment is a well-known, labor-intensive bottleneck that leads to under-enrollment and, ultimately, trial failures. Recent breakthroughs in large language models (LLMs) offer a promising opportunity to use artificial intelligence to improve screening. This study systematically explored both encoder- and decoder-based generative LLMs for screening clinical narratives to […]
Woosh: A Sound Effects Foundation Model
arXiv:2604.01929v3 Announce Type: replace-cross Abstract: The audio research community depends on open generative models as foundational tools for building novel approaches and establishing baselines. In this report, we present Woosh, Sony AI’s publicly released sound effect foundation model, detailing its architecture, training process, and an evaluation against other popular open models. Being optimized for sound […]
Tatemae: Detecting Alignment Faking via Tool Selection in LLMs
arXiv:2604.26511v1 Announce Type: cross Abstract: Alignment faking (AF) occurs when an LLM strategically complies with training objectives to avoid value modification, reverting to prior preferences once monitoring is lifted. Current detection methods focus on conversational settings and rely primarily on Chain-of-Thought (CoT) analysis, which provides a reliable signal when strategic reasoning surfaces, but cannot distinguish […]
Graph Construction and Matching for Imperative Programs using Neural and Structural Methods
arXiv:2604.26578v1 Announce Type: cross Abstract: Reusing verification artefacts requires identifying structural and semantic similarities across programs and their specifications. In this paper, we focus on graph construction as a foundational step toward this goal. We present a pipeline that converts imperative programs and their annotations into typed, attributed graphs. Our experiments cover datasets including C […]