MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

arXiv:2602.21379v2 Announce Type: replace-cross Abstract: We introduce MrBERT, a family of 150M-300M parameter encoders built on the ModernBERT architecture and pre-trained on 35 languages and code. Through targeted adaptation, this model family achieves state-of-the-art results on Catalan- and Spanish-specific tasks, while establishing robust performance across specialized biomedical and legal domains. To bridge the gap between […]

BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics

arXiv:2601.21800v2 Announce Type: replace Abstract: This paper introduces BioAgent Bench, a benchmark dataset and an evaluation suite designed for measuring the performance and robustness of AI agents in common bioinformatics tasks. The benchmark contains curated end-to-end tasks (e.g., RNA-seq, variant calling, metagenomics) with prompts that specify concrete output artifacts to support automated assessment, including stress […]

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

arXiv:2603.08014v1 Announce Type: cross Abstract: Federated fine-tuning of large language models (LLMs) with low-rank adaptation (LoRA) offers a communication-efficient and privacy-preserving solution for task-specific adaptation. Naive aggregation of LoRA modules introduces noise due to mathematical incorrectness when averaging the downsampling and upsampling matrices independently. However, existing noise-free aggregation strategies inevitably compromise the structural expressiveness of […]

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

arXiv:2602.22401v3 Announce Type: replace Abstract: AI agents — systems that execute multi-step reasoning workflows with persistent state, tool access, and specialist skills — represent a qualitative shift from prior automation technologies in social science. Unlike chatbots that respond to isolated queries, AI agents can now read files, run code, query databases, search the web, and […]

RoboLayout: Differentiable 3D Scene Generation for Embodied Agents

arXiv:2603.05522v2 Announce Type: replace Abstract: Recent advances in vision language models (VLMs) have shown strong potential for spatial reasoning and 3D scene layout generation from open-ended language instructions. However, generating layouts that are not only semantically coherent but also feasible for interaction by embodied agents remains challenging, particularly in physically constrained indoor environments. In this […]

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation

arXiv:2603.08007v1 Announce Type: cross Abstract: Existing aerial Vision-Language Navigation (VLN) methods predominantly adopt a detection-and-planning pipeline, which converts open-vocabulary detections into discrete textual scene graphs. These approaches are plagued by inadequate spatial reasoning capabilities and inherent linguistic ambiguities. To address these bottlenecks, we propose a Visual-Spatial Reasoning (ViSA) enhanced framework for aerial VLN. Specifically, a […]

BNEM: A Boltzmann Sampler Based on Bootstrapped Noised Energy Matching

arXiv:2409.09787v5 Announce Type: replace-cross Abstract: Developing an efficient sampler capable of generating independent and identically distributed (IID) samples from a Boltzmann distribution is a crucial challenge in scientific research, e.g. molecular dynamics. In this work, we intend to learn neural samplers given energy functions instead of data sampled from the Boltzmann distribution. By learning the […]

Symmetry-Driven Generation of Crystal Structures from Composition

arXiv:2602.17176v3 Announce Type: replace-cross Abstract: Crystal structure prediction (CSP), which aims to predict the three-dimensional atomic arrangement of a crystal from its composition, is central to materials discovery and mechanistic understanding. However, given the composition in a unit cell, existing methods struggle with the NP-hard combinatorial challenge of rigorous symmetry enforcement or rely on retrieving […]

Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising

arXiv:2502.06432v3 Announce Type: replace-cross Abstract: Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the […]

Aero-Promptness: Drag-Aware Aerodynamic Manipulability for Propeller-driven Vehicles

arXiv:2603.07998v1 Announce Type: cross Abstract: This work introduces the Drag-Aware Aerodynamic Manipulability (DAAM), a geometric framework for control allocation in redundant multirotors. By equipping the propeller spin-rate space with a Riemannian metric based on the remaining symmetric acceleration capacity of each motor, the formulation explicitly accounts for motor torque limits and aerodynamic drag. Mapping this […]

The Cell Must Go On: Agar.io for Continual Reinforcement Learning

arXiv:2505.18347v2 Announce Type: replace-cross Abstract: Continual reinforcement learning (RL) concerns agents that are expected to learn continually, rather than converge to a policy that is then fixed for evaluation. This setting is well-suited to environments that the agent perceives as changing over time, rendering any static policy ineffective. In continual RL, researchers often simulate such […]

LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio

arXiv:2602.14612v3 Announce Type: replace-cross Abstract: Long-duration audio is increasingly common in industrial and consumer settings, yet reviewing multi-hour recordings is impractical, motivating systems that answer natural-language queries with precise temporal grounding and minimal hallucination. Existing audio-language models show promise, but long-audio question answering remains difficult due to context-length limits. We introduce LongAudio-RAG (LA-RAG), a hybrid […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844