Textual Supervision Enhances Geospatial Representations in Vision-Language Models

arXiv:2606.07172v1 Announce Type: cross Abstract: Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations acquired by three model families: vision-only architectures (e.g., ViT), vision-language models (e.g., CLIP), and large-scale multimodal foundation models […]

Proxy Reconstruction Pre-training for Ramp Flow Prediction at Highway Interchanges

arXiv:2510.03381v3 Announce Type: replace-cross Abstract: Interchanges are crucial nodes for vehicle transfers between highways, yet the lack of real-time ramp detectors creates blind spots in traffic prediction. To address this, we propose a Spatio-Temporal Decoupled Autoencoder (STDAE), a two-stage framework that leverages cross-modal reconstruction pretraining. In the first stage, STDAE reconstructs historical ramp flows from […]

Enhancing Video Representations with Spatiotemporal-Semantic Residual to Mitigate Hallucinations in Video Large Multimodal Models

arXiv:2601.22574v2 Announce Type: replace-cross Abstract: Although Video Large Multimodal Models have achieved strong performance in video understanding, they still suffer from hallucination. Existing inference-time intervention methods usually modify videos under the contrastive decoding framework, but their heuristic designs bring limited improvements and increase inference latency. To address these issues, we propose ViSSRes, an inference-time intervention […]

LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms

arXiv:2603.11333v2 Announce Type: replace Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes counterfactual policy evaluation difficult in production, especially for long-horizon and distributional outcomes. The challenge is amplified as platforms deploy AI tools that change what content enters the system, how agents adapt, […]

Beyond the Black Box: Interpretability of Agentic AI Tool Use

arXiv:2605.06890v3 Announce Type: replace Abstract: AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose consequence becomes visible only after execution. Existing observability methods are external: prompts reveal correlations, evaluations […]

APLSuite: An Integrated Suite for CD4+ T Cell Epitope Prediction via Antigen Processing Likelihood

arXiv:2606.02462v2 Announce Type: replace Abstract: Computational epitope prediction is a critical tool for exploring and understanding CD4+ T cell-mediated immune responses, a key aspect of adaptive immunity. While existing computational methods primarily focus on supervised learning approaches, they often overlook the essential role of antigen processing in determining binding specificity. To address this limitation, our […]

Bounded-Abstention Pairwise Learning to Rank

arXiv:2505.23437v2 Announce Type: replace-cross Abstract: Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is abstention, which enables algorithmic decision-making systems to defer uncertain or low-confidence decisions to human experts. While abstention […]

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

arXiv:2606.07271v1 Announce Type: cross Abstract: Understanding what generative models retain from training data remains challenging, with implications for copyright and privacy. Beyond verbatim reproduction, models can encode subtler traces of their training data that never surface in their outputs yet remain exploitable. We study this regime for Rectified Flows, which are increasingly used in deployed […]

OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation

arXiv:2604.05360v2 Announce Type: replace-cross Abstract: Gait analysis is essential in post-stroke rehabilitation but remains time-intensive and cognitively demanding, especially when clinicians must integrate gait videos and motion-capture data into structured reports. We present OGA-AID, a clinician-in-the-loop multi-agent large language model system for multimodal report drafting. The system coordinates 3 specialized agents to synthesize patient movement […]

Endogenous Resistance to Activation Steering in Language Models

arXiv:2602.06941v2 Announce Type: replace-cross Abstract: Large language models can recover mid-generation from task-misaligned activation steering, producing explicit verbal restarts (e.g., “wait, that’s not right”) and continuing on-topic even while the steering perturbation remains active. We term this Endogenous Steering Resistance (ESR). Using sparse autoencoder (SAE) latents to steer model activations, we find that Llama-3.3-70B exhibits […]

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

arXiv:2605.00369v4 Announce Type: replace-cross Abstract: We study how large language models can be used to generate inventory policies in online settings with non-stationary demand. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance on static and highly structured problems such as mathematical discovery, but is not […]

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

arXiv:2606.06564v1 Announce Type: cross Abstract: Residual connections are central to training deep Transformers, but standard PreNorm residual streams aggregate sublayer updates with fixed unit weights. Recent Attention Residuals replace this fixed accumulation with content-dependent depth-wise routing, and Block Attention Residuals make the mechanism efficient by routing over block-level residual summaries. However, a single block summary […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844