PixelArena: A benchmark for Pixel-Precision Visual Intelligence

arXiv:2512.16303v1 Announce Type: cross Abstract: Multi-modal large language models that have image output are emerging. Many image generation benchmarks focus on aesthetics instead of fine-grained generation capabilities. In PixelArena, we propose using semantic segmentation tasks to objectively examine their fine-grained generative intelligence with pixel precision. We find the latest Gemini 3 Pro Image has emergent […]

Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging

arXiv:2512.08333v2 Announce Type: replace-cross Abstract: Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations […]

TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge

arXiv:2512.16855v1 Announce Type: new Abstract: Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language […]

UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era

arXiv:2512.13089v2 Announce Type: replace-cross Abstract: Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring. Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs; they typically focus on a few predefined categories and generalize poorly to diverse scenes. […]

TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering

arXiv:2512.16270v1 Announce Type: cross Abstract: Text rendering has recently emerged as one of the most challenging frontiers in visual generation, drawing significant attention from large-scale diffusion and multimodal models. However, text editing within images remains largely unexplored, as it requires generating legible characters while preserving semantic, geometric, and contextual coherence. To fill this gap, we […]

Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity Preservation

arXiv:2512.13478v3 Announce Type: replace-cross Abstract: Current artificial intelligence systems, despite remarkable capabilities in text generation and pattern recognition, exhibit a fundamental architectural limitation: they resolve ambiguity prematurely. This premature semantic collapse — the tendency to collapse multiple valid interpretations into a single output — stems from classical identity assumptions embedded in standard neural architectures. We […]

Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification

arXiv:2512.16271v1 Announce Type: cross Abstract: Accurate and interpretable classification of infant cry paralinguistics is essential for early detection of neonatal distress and clinical decision support. However, many existing deep learning methods rely on correlation-driven acoustic representations, which makes them vulnerable to noise, spurious cues, and domain shifts across recording environments. We propose DACH-TIC, a Domain-Agnostic […]

Interpretable Deep Learning for Stock Returns: A Consensus-Bottleneck Asset Pricing Model

arXiv:2512.16251v1 Announce Type: cross Abstract: We introduce the textitConsensus-Bottleneck Asset Pricing Model (CB-APM), a partially interpretable neural network that replicates the reasoning processes of sell-side analysts by capturing how dispersed investor beliefs are compressed into asset prices through a consensus formation process. By modeling this “bottleneck” to summarize firm- and macro-level information, CB-APM not only […]

Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models

arXiv:2512.16244v1 Announce Type: cross Abstract: Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, […]

First, do NOHARM: towards clinically safe large language models

arXiv:2512.01241v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a benchmark using 100 real primary care-to-specialist consultation cases to measure frequency and severity of harm from […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844