SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

arXiv:2605.18630v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific problem is already well-posed, whereas practical scientific assistance often begins with an ill-posed […]

ChartDesign: Towards LLM Designer of Data Visualization

arXiv:2605.16274v1 Announce Type: cross Abstract: Charts are the dominant medium for visualizing data, discovering patterns and trends, and communicating data driven insights, yet designing them still requires expensive human effort and expertise, such as selecting appropriate chart types, axis orientations, font sizes, and layouts. Most automatic visualization systems rely on handcrafted heuristics or simple rule […]

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

arXiv:2605.18194v1 Announce Type: new Abstract: While Multi-Modal Large Language Models (MLLMs) demonstrate impressive capabilities in general reasoning, their embodied spatial intelligence remains hampered by a “Cartesian Illusion” – a reliance on text-based probability distributions that lack grounded, 3D topological understanding. This limitation is starkly exposed in multi-agent environments, which demand more than just scene perception; […]

AgentWall: A Runtime Safety Layer for Local AI Agents

arXiv:2605.16265v1 Announce Type: new Abstract: The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the consequences of unsafe or adversarially manipulated behavior become immediate and tangible. Existing […]

TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?

arXiv:2605.18025v1 Announce Type: new Abstract: While Large Language Models have achieved remarkable integration in various vertical scenarios, their deployment in the telecommunications domain remains exploratory due to the lack of a standardized evaluation framework. Current telecom benchmarks primarily focus on static, foundational knowledge and isolated atomic skills, neglecting the equipment-specific documentation and end-to-end industrial workflows […]

POST: Prior-Observation Adversarial Learning of Spatio-Temporal Associations for Multivariate Time Series Anomaly Detection

arXiv:2605.18128v1 Announce Type: new Abstract: Existing Multivariate Time Series Anomaly Detection (MTSAD) frameworks increasingly rely on integrating Graph Neural Networks (GNNs) with sequence models to capture complex spatio-temporal dependencies. However, less attention is paid to the spatial over-generalization problem, where unconstrained structural modeling indiscriminately reconstructs anomalies, inevitably degrading detection recall. To tackle this problem, we […]

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

arXiv:2605.18529v1 Announce Type: new Abstract: The alignment of Large Language Models (LLMs) for complex reasoning heavily relies on Reinforcement Learning with Verifiable Rewards (RLVR). However, standard algorithms like GRPO apply sequence-level rewards uniformly to all tokens, creating a severe credit-assignment bottleneck. While on-policy self-distillation attempts to resolve this by conditioning a self-teacher on privileged contexts, […]

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

arXiv:2605.18693v1 Announce Type: new Abstract: As LLM agents are increasingly built around reusable skills, a central challenge is no longer only whether agents can use provided skills, but whether they can generate correct, reusable, and executable skills from repositories and documents. Existing benchmarks primarily evaluate the efficacy of given skills or the ability of agents […]

UVTran: Accurate Hole-Filling Parameterization with Transformers

arXiv:2605.16306v1 Announce Type: cross Abstract: In industrial design, N-sided hole filling is typically formulated as the construction of a single trimmed B-spline surface by minimizing a fairness energy subject to geometric boundary constraints. This formulation requires an accurate parameter-space representation of the trimming curve on the filling surface. Most existing methods project the hole boundary […]

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

arXiv:2605.16354v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in high-stakes applications. In this role, LLMs are used to generate judgments about the quality, appropriateness, or even safety of model outputs. This approach is motivated by practical constraints. Expert human ratings are costly and difficult […]

Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

arXiv:2605.07263v2 Announce Type: replace-cross Abstract: Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes […]

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

arXiv:2605.17967v1 Announce Type: new Abstract: This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844