Adaptation to free-living drives loss of beneficial endosymbiosis through metabolic trade-offs

Symbioses are widespread (1) and underpin the function of diverse ecosystems (2-6), but their evolutionary stability is challenging to explain (7,8). Fitness trade-offs between con-trasting

Gradient-specified optimization based on muscle surface mesh and moment arm as an effect-oriented approach of automated musculotendon path modeling

There is more to musculotendon path modeling than aligning a cable to reflect the geometric features of a muscle-tendon unit. From the perspective of simulation

Dissecting polycomb complexes for enhanced fetal hemoglobin production

Polycomb repressive complexes PRC1 and PRC2 regulate diverse developmental processes, including the fetal-to-adult switch in hemoglobin production, a process whose reversal is a goal for

Therapy-associated mutagenesis at CTCF binding sites is shaped by chromatin context and DNA repair capacity

Genotoxic cancer therapies introduce DNA damage that can be fixed as somatic mutations in surviving tumor cells. However, the impact of therapy-associated mutagenesis on regulatory

Improved deconvolution of circulating tumor DNA from ultra-low-pass whole-genome methylation sequencing using CelFiE-ISH

Liquid biopsy using ultra-low-pass whole-genome sequencing (ULP-WGS, ~0.25x coverage) is a promising tool to detect circulating tumor DNA (ctDNA) for cancer management, and the use

IndicDB — Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages

April 16, 2026

arXiv:2604.13686v1 Announce Type: cross
Abstract: While Large Language Models (LLMs) have significantly advanced Text-to-SQL performance, existing benchmarks predominantly focus on Western contexts and simplified schemas, leaving a gap in real-world, non-Western applications. We present IndicDB, a multilingual Text-to-SQL benchmark for evaluating cross-lingual semantic parsing across diverse Indic languages. The relational schemas are sourced from open-data platforms, including the National Data and Analytics Platform (NDAP) and the India Data Portal (IDP), ensuring realistic administrative data complexity. IndicDB comprises 20 databases across 237 tables. To convert denormalized government data into rich relational structures, we employ an iterative three-agent framework (Architect, Auditor, Refiner) to ensure structural rigor and high relational density (11.85 tables per database; join depths up to six). Our pipeline is value-aware, difficulty-calibrated, and join-enforced, generating 15,617 tasks across English, Hindi, and five Indic languages. We evaluate cross-lingual semantic parsing performance of state-of-the-art models (DeepSeek v3.2, MiniMax 2.7, LLaMA 3.3, Qwen3) across seven linguistic variants. Results show a 9.00% performance drop from English to Indic languages, revealing an “Indic Gap” driven by harder schema linking, increased structural ambiguity, and limited external knowledge. IndicDB serves as a rigorous benchmark for multilingual Text-to-SQL. Code and data: https://anonymous.4open.science/r/multilingualText2Sql-Indic–DDCC/

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844