Human-like Object Grouping in Self-supervised Vision Transformers

arXiv:2603.13994v1 Announce Type: cross Abstract: Vision foundation models trained with self-supervised objectives achieve strong performance across diverse tasks and exhibit emergent object segmentation properties. However, their alignment with human object perception remains poorly understood. Here, we introduce a behavioral benchmark in which participants make same/different object judgments for dot pairs on naturalistic scenes, scaling up […]

Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios

arXiv:2603.14259v1 Announce Type: cross Abstract: Generative recommendation (GR) has shown strong potential for sequential recommendation in an end-to-end generation paradigm. However, existing GR models suffer from severe cold-start collapse: their recommendation accuracy on cold-start items can drop to near zero. Current solutions typically rely on retraining with cold-start interactions, which is hindered by sparse feedback, […]

The Big Send-off: Scalable and Performant Collectives for Deep Learning

arXiv:2504.18658v2 Announce Type: replace-cross Abstract: Collective communication is becoming increasingly important in data center and supercomputer workloads with an increase in distributed AI related jobs. However, existing libraries that provide collective support such as NCCL, RCCL, and Cray-MPICH exhibit several performance and scalability limitations on modern GPU supercomputers. To address these challenges, we introduce the […]

Multi-Axis Trust Modeling for Interpretable Account Hijacking Detection

arXiv:2603.13246v1 Announce Type: new Abstract: This paper proposes a Hadith-inspired multi-axis trust modeling framework, motivated by a structurally analogous problem in classical Hadith scholarship: assessing the trustworthiness of information sources using interpretable, multidimensional criteria rather than a single anomaly score. We translate five trust axes – long-term integrity (adalah), behavioral precision (dabt), contextual continuity (isnad), […]

A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

arXiv:2603.00824v2 Announce Type: replace-cross Abstract: We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a sheaf-theoretic atlas of local semantic charts. Contexts are clustered into a stratified context complex; each chart carries a local feature space and a local information-geometric metric (Fisher/Gauss-Newton) identifying predictively consequential […]

ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation

arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where temporal fidelity and version-aware API correctness are critical. ManiBench targets two key failure modes: Syntactic […]

VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

arXiv:2511.05791v2 Announce Type: replace-cross Abstract: Robotic grasping is a fundamental capability for enabling autonomous manipulation, with usually infinite solutions. State-of-the-art approaches for grasping rely on learning from large-scale datasets comprising expert annotations of feasible grasps. Curating such datasets is challenging, and hence, learning-based methods are limited by the solution coverage of the dataset, and require […]

Learning Question-Aware Keyframe Selection with Synthetic Supervision for Video Question Answering

arXiv:2603.14953v1 Announce Type: cross Abstract: Large multimodal models (LMMs) have recently demonstrated remarkable performance in video question answering (VideoQA), yet reasoning over video remains challenging due to high inference cost and diluted information. Keyframe selection offers efficiency and sharper reasoning but suffers from sparse supervision and redundant frame choices when relying only on image-text similarity. […]

Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization

arXiv:2603.14703v1 Announce Type: cross Abstract: Large language models and AI agents have recently shown promise in automating software performance optimization, but existing approaches predominantly rely on local, syntax-driven code transformations. This limits their ability to reason about program behavior and capture whole system performance interactions. As modern software increasingly comprises interacting components – such as […]

Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion

arXiv:2603.13776v1 Announce Type: cross Abstract: Large language models have recently enabled a generative paradigm for query expansion, but their high inference cost makes direct deployment difficult in practical retrieval systems. To address this issue, a retrieval-feedback-driven distillation and preference-alignment framework is proposed to transfer retrieval-friendly expansion behavior from a strong teacher model to a compact […]

Step-CoT: Stepwise Visual Chain-of-Thought for Medical Visual Question Answering

arXiv:2603.13878v1 Announce Type: cross Abstract: Chain-of-thought (CoT) reasoning has advanced medical visual question answering (VQA), yet most existing CoT rationales are free-form and fail to capture the structured reasoning process clinicians actually follow. This work asks: Can traceable, multi-step reasoning supervision improve reasoning accuracy and the interpretability of Medical VQA? To this end, we introduce […]

Citation-Enforced RAG for Fiscal Document Intelligence: Cited, Explainable Knowledge Retrieval in Tax Compliance

arXiv:2603.14170v1 Announce Type: cross Abstract: Tax authorities and public-sector financial agencies rely on large volumes of unstructured and semi-structured fiscal documents – including tax forms, instructions, publications, and jurisdiction-specific guidance – to support compliance analysis and audit workflows. While recent advances in generative AI and retrieval-augmented generation (RAG) have shown promise for document-centric question answering, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844