Finite-Time Analysis of Gradient Descent for Shallow Transformers

arXiv:2601.16514v2 Announce Type: replace-cross Abstract: Understanding why Transformers perform so well remains challenging due to their non-convex optimization landscape. In this work, we analyze a shallow Transformer with $m$ independent heads trained by projected gradient descent in the kernel regime. Our analysis reveals two main findings: (i) the width required for nonasymptotic guarantees scales only […]

A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability

arXiv:2406.09031v5 Announce Type: replace-cross Abstract: Graph pooling has gained attention for its ability to obtain effective node and graph representations for various downstream tasks. Despite the recent surge in graph pooling approaches, there is a lack of standardized experimental settings and fair benchmarks to evaluate their performance. To address this issue, we have constructed a […]

Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning

arXiv:2508.14285v3 Announce Type: replace-cross Abstract: Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, when a problem requires incorporating information from multiple datasets – as in few shot learning – generalization across datasets can be limited, driving up training costs. As a consequence, […]

Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

arXiv:2601.11016v2 Announce Type: replace-cross Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates. We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance that […]

Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models

arXiv:2604.01965v1 Announce Type: cross Abstract: Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters. Such reliance limits reproducibility and accessibility for the research community. In this work, we ask a simple question: do we need bigger models for […]

ActionParty: Multi-Subject Action Binding in Generative Video Games

arXiv:2604.02330v1 Announce Type: cross Abstract: Recent advances in video diffusion have enabled the development of “world models” capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action binding in existing video diffusion […]

Fast dynamical similarity analysis

arXiv:2511.22828v2 Announce Type: replace Abstract: Understanding how nonlinear dynamical systems (e.g., artificial neural networks and neural circuits) process information requires comparing their underlying dynamics at scale, across diverse architectures and large neural recordings. While many similarity metrics exist, current approaches fall short for large-scale comparisons. Geometric methods are computationally efficient but fail to capture governing […]

Cognitive Friction: A Decision-Theoretic Framework for Bounded Deliberation in Tool-Using Agents

arXiv:2603.30031v3 Announce Type: replace Abstract: Autonomous tool-using agents in networked environments must decide which information source to query and when to stop querying and act. Without principled bounds on information-acquisition costs, unconstrained agents exhibit systematic failure modes: excessive tool use under congestion, prolonged deliberation under time decay, and brittle behavior under ambiguous evidence. We propose […]

SAKE: Structured Agentic Knowledge Extrapolation for Complex LLM Reasoning via Reinforcement Learning

arXiv:2505.15062v4 Announce Type: replace-cross Abstract: Knowledge extrapolation is the process of inferring novel information by combining and extending existing knowledge that is explicitly available. It is essential for solving complex questions in specialized domains where retrieving comprehensive external knowledge is impractical. We propose SAKE (Structured Agentic Knowledge Extrapolation), a RL powered agentic framework that trains […]

AutoPK: Leveraging LLMs and a Hybrid Similarity Metric for Advanced Retrieval of Pharmacokinetic Data from Complex Tables and Documents

arXiv:2510.00039v2 Announce Type: replace-cross Abstract: Pharmacokinetics (PK) plays a critical role in drug development and regulatory decision-making for human and veterinary medicine, directly affecting public health through drug safety and efficacy assessments. However, PK data are often embedded in complex, heterogeneous tables with variable structures and inconsistent terminologies, posing significant challenges for automated PK data […]

Improvise, Adapt, Overcome — Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging

arXiv:2512.13855v2 Announce Type: replace-cross Abstract: Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter-Efficient Fine-Tuning (PEFT) methods apply uniform adapter dimensions across all transformer layers, leading to suboptimal parameter allocation and reduced adaptation efficiency. We introduce Telescopic Adapters, a novel PEFT framework that […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844