Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

arXiv:2605.21417v2 Announce Type: replace-cross Abstract: Blended emotion recognition is challenging because emotions are often expressed as mixtures of subtle and overlapping multimodal cues rather than a single dominant signal. We propose a rank-aware multi-encoder framework that selectively combines complementary representations from diverse pre-extracted video and audio encoders. Our method projects heterogeneous encoder features into a […]

Accelerated Simulation Algorithms for Extreme First-Passage Problems with General Emission Profiles

arXiv:2605.25295v1 Announce Type: cross Abstract: Fastest arrival events, where the first among many diffusing particles reaches a target, are central in triggering signal initiation in molecular stochastic systems. Classical approaches to simulate such events rely on full trajectory generation of all particles, leading to prohibitive computational costs in the large particle number regime. In this […]

Mimir: Large-scale Multilingual Concept Modeling

arXiv:2605.25263v1 Announce Type: cross Abstract: Current language modeling approaches are built around tokens. Text corpora are split into tokens, and models are trained by performing computations on these tokens, such as predicting the next token given the preceding ones as context. This paradigm has become the standard in modern language modeling, especially given the outstanding […]

UWM-JEPA: Predictive World Models That Imagine in Belief Space

arXiv:2605.25313v1 Announce Type: cross Abstract: World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the […]

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

arXiv:2605.20278v2 Announce Type: replace-cross Abstract: Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual claims. A good dense caption should be both faithful and informative, avoiding hallucination without omitting salient details. Yet pairwise preferences, reference-based metrics, and […]

‘Si’multaneous ‘S’patial-‘T’emporal Message Passing for Dynamic Graph Representation Learning

arXiv:2605.25548v1 Announce Type: cross Abstract: Dynamic graph neural networks (DGNNs) that operate on snapshot sequences typically fall into one of two categories. emphTemporal-first approaches build per-node temporal embeddings and only afterwards perform spatial aggregation, whereas emphSpatial-first approaches invert this order, feeding the output of a graph convolution into a downstream temporal module. In either case, […]

First, do no harm: Breaking suicidogenic echo chambers in media recommendation

arXiv:2605.25258v1 Announce Type: cross Abstract: Recommender systems generally optimises user engagement, but this approach is dangerous in mental health contexts. When vulnerable users show signs of suicidal ideation, standard algorithms often trap them in echo chambers of harmful content, worsening their psychological state. In response, we introduce RankAid, a re-ranking method that prioritises clinical safety […]

When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills

arXiv:2605.25832v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as proposal generators for evolutionary robot design, yet most loops remain memoryless: simulator results shape the next population but are not preserved as reusable design knowledge. We present Auto-Robotist, a self-evolving LLM agent that distills morphology-search traces into an explicit natural-language skill library. […]

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

arXiv:2605.19846v3 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fine-grained comprehension crucial for real-world applications requiring nuanced interpretation of human actions and interactions. While some recent human-centric benchmarks evaluate aspects of model behaviour such as fairness/ethics, emotion perception, and broader human-centric metrics, […]

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

arXiv:2605.26074v1 Announce Type: cross Abstract: Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committed to in the market. We introduce StakeBench, an evaluation framework for language understanding grounded in market commitment. StakeBench links 560,876 comments from 2,261 resolved markets to verified […]

Guess the Unified Model: How Much Can We Recover from Generated Images?

arXiv:2605.25254v1 Announce Type: cross Abstract: With unified model-generated images now widespread online, attributing their model of origin offers a path toward transparency and deeper insight into the characteristic behaviors of individual models. Prior work has explored provenance in LLM-generated text, diffusion model images, and datasets, but the separability of unified model-generated images remains an underexplored […]

Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems

arXiv:2512.15922v3 Announce Type: replace Abstract: Despite initial successes and a variety of architectures, retrieval-augmented generation systems still struggle to reliably retrieve and connect the multi-step evidence required for complicated reasoning tasks. Most of the standard RAG frameworks regard all retrieved information as equally reliable, overlooking the varying credibility and interconnected nature of large textual corpora. […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844