Do LLMs Understand Collaborative Signals? Diagnosis and Repair

arXiv:2505.20730v4 Announce Type: replace-cross Abstract: Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. […]

Your VAR Model is Secretly an Efficient and Explainable Generative Classifier

arXiv:2510.12060v2 Announce Type: replace-cross Abstract: Generative classifiers, which leverage conditional generative models for classification, have recently demonstrated desirable properties such as robustness to distribution shifts. However, recent progress in this area has been largely driven by diffusion-based models, whose substantial computational cost severely limits scalability. This exclusive focus on diffusion-based methods has also constrained our […]

The Impact of Corporate AI Washing on Farmers’ Digital Financial Behavior Response — An Analysis from the Perspective of Digital Financial Exclusion

arXiv:2603.18421v2 Announce Type: replace-cross Abstract: In the context of the rapid development of digital finance, some financial technology companies exhibit the phenomenon of “AI washing,” where they overstate their AI capabilities while underinvesting in actual AI resources. This paper constructs a corporate-level AI washing index based on CHFS2019 data and AI investment data from 15-20 […]

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

arXiv:2603.21697v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and “complete the comic.” Building on JailbreakBench and JailbreakV, […]

LRC-WeatherNet: LiDAR, RADAR, and Camera Fusion Network for Real-time Weather-type Classification in Autonomous Driving

arXiv:2603.21987v1 Announce Type: cross Abstract: Autonomous vehicles face major perception and navigation challenges in adverse weather such as rain, fog, and snow, which degrade the performance of LiDAR, RADAR, and RGB camera sensors. While each sensor type offers unique strengths, such as RADAR robustness in poor visibility and LiDAR precision in clear conditions, they also […]

Exploring Multi-Objective Trade-offs in Reference Compound Selection for Validation Studies of Toxicity Assays

arXiv:2505.07140v3 Announce Type: replace Abstract: In chemical safety assessment, validation studies rely on reference compound lists to evaluate the applicability of alternative methods prior to regulatory acceptance. These lists are expected to cover multiple aspects, including chemical structure, physicochemical properties, and toxicity profiles. In practice, however, trade-offs among these aspects are typically addressed implicitly through […]

S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home

arXiv:2603.05027v2 Announce Type: replace Abstract: The smart home is a key application domain within the Society 5.0 vision for a human-centered society. As smart home ecosystems expand with heterogeneous IoT protocols, diverse devices, and evolving threats, autonomous systems must manage comfort, security, energy, and safety for residents. Such autonomous decision-making requires a trust anchor, making […]

UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking

arXiv:2502.18220v2 Announce Type: replace-cross Abstract: Multi-modal tracking is essential in single-object tracking (SOT), as different sensor types contribute unique capabilities to overcome challenges caused by variations in object appearance. However, existing unified RGB-X trackers (X represents depth, event, or thermal modality) either rely on the task-specific training strategy for individual RGB-X image pairs or fail […]

Long Chain-of-Thought Reasoning Across Languages

arXiv:2508.14828v3 Announce Type: replace-cross Abstract: While large reasoning models have shown remarkable ability to generate long chains-of-thought (CoTs) in English, we still lack understanding of how these long-form reasoning abilities transfer to the vast majority of the world’s languages. In this work, we systematically investigate four key stages of model development–scaling, pretraining, post-training, and inference–to […]

Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

arXiv:2511.22169v2 Announce Type: replace-cross Abstract: Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challenging in regions with complex terrain and strong atmospheric dynamics such as East Asia. While foundation models such as Aurora offer global generality, they often miss region-specific dynamics […]

Fast-WAM: Do World Action Models Need Test-time Future Imagination?

arXiv:2603.16666v2 Announce Type: replace-cross Abstract: World Action Models (WAMs) have emerged as a promising alternative to Vision-Language-Action (VLA) models for embodied control because they explicitly model how visual observations may evolve under action. Most existing WAMs follow an imagine-then-execute paradigm, incurring substantial test-time latency from iterative video denoising, yet it remains unclear whether explicit future […]

Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

arXiv:2603.13406v2 Announce Type: replace-cross Abstract: Emotion recognition in videos is a pivotal task in affective computing, where identifying subtle psychological states such as Ambivalence and Hesitancy holds significant value for behavioral intervention and digital health. Ambivalence and Hesitancy states often manifest through cross-modal inconsistencies such as discrepancies between facial expressions, vocal tones, and textual semantics, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844