arXiv:2512.04097v2 Announce Type: replace-cross Abstract: In this paper, we introduce, MultiGA, an optimization framework which applies genetic algorithm principles to address complex natural language tasks and reasoning problems by sampling from a diverse population of LLMs to initialize the population of candidate solutions. MultiGA generates a range of outputs from various parent LLMs and uses […]
TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
arXiv:2602.23499v3 Announce Type: replace-cross Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further exploration to enhance the perception and planning performance of vehicles. However, existing datasets are often […]
Nullstrap-DE: A General Framework for Calibrating FDR and Preserving Power in DE Methods, with Applications to DESeq2 and edgeR
arXiv:2507.20598v2 Announce Type: replace-cross Abstract: Differential expression (DE) analysis is a key task in RNA-seq studies, aiming to identify genes with expression differences across conditions. A central challenge is balancing false discovery rate (FDR) control with statistical power. Parametric methods such as DESeq2 and edgeR achieve high power by modeling gene-level counts using negative binomial […]
Support-Contra Asymmetry in LLM Explanations
arXiv:2510.21884v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) increasingly produce natural language explanations alongside their predictions, yet it remains unclear whether these explanations reference predictive cues present in the input text. In this work, we present an empirical study of how LLM-generated explanations align with predictive lexical evidence from an external model in text […]
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
arXiv:2603.28590v2 Announce Type: replace Abstract: Large language models (LLMs) can generate chains of thought (CoTs) that are not always causally responsible for their final outputs. When such a mismatch occurs, the CoT no longer faithfully reflects the actual reasons (i.e., decision-critical factors) driving the model’s behavior, leading to the reduced CoT monitorability problem. However, a […]
TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology
arXiv:2503.03485v2 Announce Type: replace-cross Abstract: Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation […]
VISTA: Visualization of Token Attribution via Efficient Analysis
arXiv:2604.02217v1 Announce Type: new Abstract: Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this “black box,” attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many existing techniques are tailored to specific […]
When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning
arXiv:2604.02226v1 Announce Type: new Abstract: Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller […]
De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules
arXiv:2604.02276v1 Announce Type: new Abstract: Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a fully automated, domain-agnostic pipeline for extracting structured regulatory rules from raw documents, requiring no human annotation, domain-specific prompting, or […]
An engineered biosensor for the fast and accurate detection of terephthalate
Accelerating the development of enzymatic degradation of polyesters such as poly(ethylene terephthalate) (PET) and poly(butylene terephthalate) (PBT) requires a rapid and parallelizable detection method. We developed a protein-based biosensor for the fast and accurate quantification of the PET and PBT degradation product, terephthalate (TPA), which we named TPAsense. Engineering TPAsense required overcoming low thermal stability […]
Single-cell, clonal and spatial atlases of cranial placodes illuminate their specification and evolution
The vertebrate head is defined by complex sensory structures derived from cranial placodes. Placodes arise alongside the neural crest at the neural plate border, yet the mechanisms governing their identity, diversification, and evolutionary origins are unclear. We present an integrated single-cell, spatial, and clonal atlas of placode development to resolve the dynamics of their lineage […]
Corpus for Benchmarking Clinical Speech De-identification
Objectives Publicly available datasets dedicated to clinical speech deidentification tasks remain scarce due to privacy constraints and the complexity of speech-level annotation. To address this gap, we compiled the SREDH-AICup sensitive health information (SHI) speech corpus, a time-aligned clinical speech dataset annotated across 38 SHI categories. Methods Two publicly available English medical-domain datasets were adapted […]