How Does Alignment Enhance LLMs’ Multilingual Capabilities? A Language Neurons Perspective

arXiv:2505.21505v3 Announce Type: replace-cross Abstract: Multilingual Alignment is an effective and representative paradigm to enhance LLMs’ multilingual capabilities, which transfers the capabilities from the high-resource languages to the low-resource languages. Meanwhile, some research on language-specific neurons provides a new perspective to analyze and understand LLMs’ mechanisms. However, we find that there are many neurons that […]

Lumos: Let there be Language Model System Certification

arXiv:2512.02966v2 Announce Type: replace-cross Abstract: We introduce the first principled framework, Lumos, for specifying and formally certifying Language Model System (LMS) behaviors. Lumos is an imperative probabilistic programming DSL over graphs, with constructs to generate independent and identically distributed prompts for LMS. It offers a structured view of prompt distributions via graphs, forming random prompts […]

VeriAct: Beyond Verifiability — Agentic Synthesis of Correct and Complete Formal Specifications

arXiv:2604.00280v1 Announce Type: cross Abstract: Formal specifications play a central role in ensuring software reliability and correctness. However, automatically synthesizing high-quality formal specifications remains a challenging task, often requiring domain expertise. Recent work has applied large language models to generate specifications in Java Modeling Language (JML), reporting high verification pass rates. But does passing a […]

EvolveTool-Bench: Evaluating the Quality of LLM-Generated Tool Libraries as Software Artifacts

arXiv:2604.00392v1 Announce Type: cross Abstract: Modern LLM agents increasingly create their own tools at runtime — from Python functions to API clients — yet existing benchmarks evaluate them almost exclusively by downstream task completion. This is analogous to judging a software engineer only by whether their code runs, ignoring redundancy, regression, and safety. We introduce […]

Explainable AI for Blind and Low-Vision Users: Navigating Trust, Modality, and Interpretability in the Agentic Era

arXiv:2604.00187v1 Announce Type: cross Abstract: Explainable Artificial Intelligence (XAI) is critical for ensuring trust and accountability, yet its development remains predominantly visual. For blind and low-vision (BLV) users, the lack of accessible explanations creates a fundamental barrier to the independent use of AI-driven assistive technologies. This problem intensifies as AI systems shift from single-query tools […]

Harmonization mitigates diffusion MRI scanner effects in infancy: insights from the HEALthy Brain and Childhood Development (HBCD) study

arXiv:2604.00246v1 Announce Type: cross Abstract: The HEALthy Brain and Childhood Development (HBCD) Study is an ongoing longitudinal initiative to understand population-level brain maturation; however, large-scale studies must overcome site-related variance and preserve biologically relevant signal. In addition to diffusion-weighted magnetic resonance imaging images, the HBCD dataset offers analysis-ready derivatives for scientists to conduct their analysis, […]

From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering

arXiv:2604.00120v1 Announce Type: cross Abstract: Software engineering courses often require rapid upskilling in supporting knowledge areas such as domain understanding and modeling methods. We report an experience from a two-week milestone in a master’s course where 29 students used a customized ChatGPT (GPT-3.5) tutor grounded in a curated course knowledge base to learn cryptocurrency-finance basics […]

WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women’s Health Topics

arXiv:2604.00024v1 Announce Type: cross Abstract: Large language models are increasingly used for medical guidance, but women’s health remains under-evaluated in benchmark design. We present the Women’s Health Benchmark (WHBench), a targeted evaluation suite of 47 expert-crafted scenarios across 10 women’s health topics, designed to expose clinically meaningful failure modes including outdated guidelines, unsafe omissions, dosing […]

Not My Truce: Personality Differences in AI-Mediated Workplace Negotiation

arXiv:2604.00464v1 Announce Type: cross Abstract: AI-driven conversational coaching is increasingly used to support workplace negotiation, yet prior work assumes uniform effectiveness across users. We challenge this assumption by examining how individual differences, particularly personality traits, moderate coaching outcomes. We conducted a between-subjects experiment (N=267) comparing theory-driven AI (Trucey), general-purpose AI (Control-AI), and a traditional negotiation […]

Whittaker-Henderson smoother for long satellite image time series interpolation

arXiv:2604.00048v1 Announce Type: cross Abstract: Whittaker smoother is a widely adopted solution to pre-process satellite image time series. Yet, two key limitations remain: the smoothing parameter must be tuned individually for each pixel, and the standard formulation assumes homoscedastic noise, imposing uniform smoothing across the temporal dimension. This paper addresses both limitations by casting the […]

Neural-Assisted in-Motion Self-Heading Alignment

arXiv:2604.00168v1 Announce Type: cross Abstract: Autonomous platforms operating in the oceans require accurate navigation to successfully complete their mission. In this regard, the initial heading estimation accuracy and the time required to achieve it play a critical role. The initial heading is traditionally estimated by model-based approaches employing orientation decomposition. However, methods such as the […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844