Coming soon: 10 Things That Matter in AI Right Now

Each year we compile our 10 Breakthrough Technologies list, featuring our educated predictions for which technologies will have the biggest impact on how we live

WearBCI Dataset: Understanding and Benchmarking Real-World Wearable Brain-Computer Interfaces Signals

arXiv:2604.09649v1 Announce Type: cross Abstract: Brain-computer interfaces (BCIs) have opened new platforms for human-computer interaction, medical diagnostics, and neurorehabilitation. Wearable BCI systems, which typically employ

Para-B&B: Load-Balanced Deterministic Parallelization of Solving MIP

arXiv:2604.09556v1 Announce Type: cross Abstract: Mixed-integer programming (MIP) extends linear programming by incorporating both continuous and integer decision variables, making it widely used in production

Human-AI Interaction Traces as Blackout Poetry: Reframing AI-Supported Writing as Found-Text Creativity

arXiv:2604.09605v1 Announce Type: cross Abstract: LLMs offer new creative possibilities for writers but also raise concerns about authenticity and reader trust, particularly when AI involvement

Intent-aligned Formal Specification Synthesis via Traceable Refinement

arXiv:2604.10392v1 Announce Type: cross Abstract: Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers

Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

April 14, 2026

arXiv:2604.09687v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) excel on many multimodal reasoning benchmarks, but these evaluations often do not require an exhaustive readout of the image and can therefore obscure failures in faithfully capturing all visual details. We introduce Grid2Matrix (G2M), a controlled benchmark in which a model is shown a color grid and a color-to-number mapping, and must output the corresponding matrix. By varying grid size and the number of colors, G2M provides a simple way to increase visual complexity while minimizing semantic confounds. We find that VLMs exhibit a sharp early collapse in zero-shot end-to-end evaluation, failing on surprisingly small grids rather than degrading gradually as the task becomes denser. We probe the visual encoders of VLMs from two representative families and find that they preserve substantially more of the grid information than the corresponding end-to-end outputs. This suggests that the failure is not explained by visual encoding alone, but also reflects a gap between what remains recoverable from visual features and what is ultimately expressed in language. We term this gap textitDigital Agnosia. Further analyses show that these errors are highly structured and depend strongly on how grid cells overlap with visual patch boundaries. We also find that common strategies such as model scaling and multimodal alignment do not fully eliminate this failure mode. We expect G2M to serve as a useful testbed for understanding where and how VLMs lose fine visual details, and for evaluating tasks where missing even small visual details can matter, such as tables, charts, forms, and GUIs.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844