WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents

arXiv:2601.21872v2 Announce Type: replace Abstract: Web agents hold great potential for automating complex computer tasks, yet their interactions involve long-horizon, sequential decision-making with irreversible actions. In such settings, outcome-based supervision is sparse and delayed, often rewarding incorrect trajectories and failing to support inference-time scaling. This motivates the use of Process Reward Models (WebPRMs) for web […]

Tractable Uncertainty-Aware Meta-Learning

arXiv:2210.01881v2 Announce Type: replace-cross Abstract: Meta-learning is a popular approach for learning new tasks with limited data by leveraging the commonalities among different tasks. However, meta-learned models can perform poorly when context data is too limited, or when data is drawn from an out-of-distribution (OoD) task. Especially in safety-critical settings, this necessitates an uncertainty-aware approach […]

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

arXiv:2503.10183v4 Announce Type: replace-cross Abstract: Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by contrastively reducing language biases or amplifying the weights of visual embedding during decoding. However, these […]

“I Said Things I Needed to Hear Myself”: Peer Support as an Emotional, Organisational, and Sociotechnical Practice in Singapore

arXiv:2506.09362v2 Announce Type: replace-cross Abstract: Peer support plays a vital role in expanding access to mental health care by providing empathetic, community-based support outside formal clinical systems. As digital platforms increasingly mediate such support, the design and impact of these technologies remain under-examined, particularly in Asian contexts. This paper presents findings from an interview study […]

Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software

arXiv:2510.15494v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) can generate code, but can they generate fast code for complex, real-world software systems? In this study, we investigate this question using a dataset of 65 tasks mined from performance-critical open-source Java projects. Unlike prior studies, which focused on algorithmic puzzles, we conduct experiments on actual […]

M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery

arXiv:2604.07468v1 Announce Type: new Abstract: Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem: resemblance is necessary but not sufficient evidence. Most prior systems reduce influence discovery to embedding similarity or label-driven graph completion, while recent multimodal large language models (LLMs) remain vulnerable to temporal inconsistency and […]

Munkres’ General Topology Autoformalized in Isabelle/HOL

arXiv:2604.07455v1 Announce Type: new Abstract: We describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres’ Topology (general topology, Chapters 2–8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. […]

Position Paper: From Edge AI to Adaptive Edge AI

arXiv:2604.07360v1 Announce Type: cross Abstract: Edge AI is often framed as model compression and deployment under tight constraints. We argue a stronger operational thesis: Edge AI in realistic deployments is necessarily adaptive. In long-horizon operation, a fixed (non-adaptive) configuration faces a fundamental failure mode: as data and operating conditions evolve and change in time, it […]

Adversarial Evasion Attacks on Computer Vision using SHAP Values

arXiv:2601.10587v2 Announce Type: replace-cross Abstract: The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by reducing output confidence or inducing misclassifications. Such attacks are particularly insidious as they can deceive the perception of an algorithm while eluding […]

How Much LLM Does a Self-Revising Agent Actually Need?

arXiv:2604.07236v2 Announce Type: replace Abstract: Recent LLM-based agents often place world modeling, planning, and reflection inside a single language model loop. This can produce capable behavior, but it makes a basic scientific question difficult to answer: which part of the agent’s competence actually comes from the LLM, and which part comes from explicit structure around […]

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

arXiv:2506.06975v5 Announce Type: replace-cross Abstract: As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. […]

Transforming the Voice of the Customer: Large Language Models for Identifying Customer Needs

arXiv:2503.01870v2 Announce Type: replace-cross Abstract: Identifying customer needs (CNs) is fundamental to product innovation and marketing strategy. Yet for over thirty years, Voice-of-the-Customer (VOC) applications have relied on professional analysts to manually interpret qualitative data and formulate “jobs to be done.” This task is cognitively demanding, time-consuming, and difficult to scale. While current practice uses […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844