Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

arXiv:2605.05392v1 Announce Type: cross Abstract: Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possible to automatically generate evidence-based query keywords from query-free datasets? Does evidence-based query generation […]

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

arXiv:2605.05566v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the “zero-advantage problem”: when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model […]

On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning

arXiv:2605.05438v1 Announce Type: cross Abstract: Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting “Yes” or “No” regardless of input structure. We demonstrate that fine-tuning Gemma 270M on transitivity and d-separation tasks without semantic loss results in 100% collapse rate, with […]

Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents

arXiv:2604.06132v3 Announce Type: replace Abstract: Large language models are increasingly deployed as autonomous agents for multi-step workflows in real-world software environments. However, existing agent benchmarks are limited by trajectory-opaque grading, underspecified safety and robustness evaluation, and narrow coverage of modalities and interaction paradigms. We introduce Claw-Eval, an end-to-end evaluation suite addressing these gaps with 300 […]

A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks

arXiv:2605.05476v1 Announce Type: cross Abstract: Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic inconsistencies significantly affect the performance of Graph Neural Networks (GNNs) on downstream tasks. Assessing their performance and robustness remains difficult, as it is often unclear whether observed results stem from the […]

Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

arXiv:2602.12828v2 Announce Type: replace-cross Abstract: Predicting future clinical events from longitudinal electronic health records (EHRs) requires selecting plausible outcomes from a large and structured event space under sparse observations. While clinical coding systems provide hierarchical organization of events, cross-modal and temporal relationships are not explicitly specified and must instead be inferred from data, making prediction […]

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

arXiv:2605.06032v1 Announce Type: cross Abstract: Synthetic data has transformed language model training, yet its role in time series forecasting remains poorly understood. We present a large-scale empirical study: nine experiment groups, 4,218 runs systematically evaluating synthetic time series augmentation across five architectures, four synthetic signals and seven datasets. The effect is sharply architecture-conditional: channel-mixing models […]

Screening Is Enough

arXiv:2604.01178v3 Announce Type: replace-cross Abstract: A core limitation of standard softmax attention is that it does not provide an independently interpretable measure of query–key relevance: attention scores are unbounded, while attention weights are defined only relative to competing keys. Consequently, irrelevant keys cannot be explicitly rejected, and some attention mass is assigned even when no […]

T2I-VeRW: Part-level Fine-grained Perception for Text-to-Image Vehicle Retrieval

arXiv:2605.06012v1 Announce Type: cross Abstract: Vehicle Re-identification (Re-ID) aims to retrieve the most similar image to a given query from images captured by non-overlapping cameras. Extending vehicle Re-ID from image-only queries to text-based queries enables retrieval in real-world scenarios where only a witness description of the target vehicle is available. In this paper, we propose […]

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation

arXiv:2509.04154v5 Announce Type: replace-cross Abstract: We introduce Robust Filter Attention (RFA), a formulation of self-attention as a robust state estimator. Each token is treated as a noisy observation of a latent trajectory governed by a linear stochastic differential equation (SDE), and attention weights are determined by consistency under this model rather than static feature similarity. […]

Accelerating Discrete Facility Layout Optimization: A Hybrid CDCL and CP-SAT Architecture

arXiv:2512.18034v3 Announce Type: replace Abstract: Discrete facility layout design involves placing physical entities to minimize handling costs while adhering to strict safety and spatial constraints. This combinatorial problem is typically addressed using Mixed Integer Linear Programming (MILP) or Constraint Programming (CP), though these methods often face scalability challenges as constraint density increases. This study systematically […]

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

arXiv:2605.01327v2 Announce Type: replace Abstract: Existing reinforcement learning approaches for Large Language Models typically perform policy optimization at the granularity of individual tokens or entire response sequences. However, such formulations often misalign with the natural step-wise structure of reasoning processes, leading to suboptimal credit assignment and unstable training in multi-modal reasoning tasks. To bridge this […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844