Epidarex Capital raised $145 million in the first close of its fourth fund as the biotech investor searches for the next generation of standout science in overlooked research corridors of the UK and US. …
How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness
arXiv:2601.14519v2 Announce Type: replace-cross Abstract: Adversarial attacks are widely used to identify model vulnerabilities; however, their validity as proxies for robustness to random perturbations remains debated. We ask whether an adversarial example provides a representative estimate of misprediction risk under stochastic perturbations of the same magnitude, or instead reflects an atypical worst-case event. To address […]
Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models
arXiv:2601.13260v2 Announce Type: replace-cross Abstract: Tokenization underlies every large language model, yet it remains an under-theorized and inconsistently designed component. Common subword approaches such as Byte Pair Encoding (BPE) offer scalability but often misalign with linguistic structure, amplify bias, and waste capacity across languages and domains. This paper reframes tokenization as a core modeling decision […]
Collective intelligence in science: direct elicitation of diverse information from experts with unknown information structure
arXiv:2601.14047v2 Announce Type: replace-cross Abstract: Suppose we need a deep collective analysis of an open scientific problem: there is a complex scientific hypothesis and a large online group of mutually unrelated experts with relevant private information of a diverse and unpredictable nature. This information may be results of experts’ individual experiments, original reasoning of some […]
BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation
arXiv:2601.18253v1 Announce Type: cross Abstract: Accurate evaluation of user satisfaction is critical for iterative development of conversational AI. However, for open-ended assistants, traditional A/B testing lacks reliable metrics: explicit feedback is sparse, while implicit metrics are ambiguous. To bridge this gap, we introduce BoRP (Bootstrapped Regression Probing), a scalable framework for high-fidelity satisfaction evaluation. Unlike […]
Generative AI in Saudi Arabia: A National Survey of Adoption, Risks, and Public Perceptions
arXiv:2601.18234v1 Announce Type: cross Abstract: Generative Artificial Intelligence (GenAI) is rapidly becoming embedded in Saudi Arabia’s digital transformation under Vision 2030, yet public awareness, adoption, and concerns surrounding these tools remain underexplored. This study provides an early snapshot of GenAI engagement among Saudi nationals. Using a nationwide survey of 330 participants across regions, age groups, […]
Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs
arXiv:2601.18255v1 Announce Type: cross Abstract: Continual learning in Large Language Models (LLMs) faces the critical challenge of balancing stability (retaining old knowledge) and plasticity (learning new tasks). While Experience Replay (ER) is a standard countermeasure against catastrophic forgetting, its impact across diverse capabilities remains underexplored. In this work, we uncover a critical dichotomy in ER’s […]
A multimodal vision foundation model for generalizable knee pathology
arXiv:2601.18250v1 Announce Type: cross Abstract: Musculoskeletal disorders represent a leading cause of global disability, creating an urgent demand for precise interpretation of medical imaging. Current artificial intelligence (AI) approaches in orthopedics predominantly rely on task-specific, supervised learning paradigms. These methods are inherently fragmented, require extensive annotated datasets, and often lack generalizability across different modalities and […]
TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance
arXiv:2601.18241v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have shown promise in software engineering, their application to unit testing remains largely confined to isolated test generation or oracle prediction, neglecting the broader challenge of test suite maintenance. We introduce TAM-Eval (Test Automated Maintenance Evaluation), a framework and benchmark designed to evaluate model performance […]
Machine learning-enhanced non-amnestic Alzheimer’s disease diagnosis from MRI and clinical features
arXiv:2601.15530v2 Announce Type: replace-cross Abstract: Alzheimer’s disease (AD), defined as an abnormal buildup of amyloid plaques and tau tangles in the brain can be diagnosed with high accuracy based on protein biomarkers via PET or CSF analysis. However, due to the invasive nature of biomarker collection, most AD diagnoses are made in memory clinics using […]
What Do Learned Models Measure?
arXiv:2601.18278v1 Announce Type: cross Abstract: In many scientific and data-driven applications, machine learning models are increasingly used as measurement instruments, rather than merely as predictors of predefined labels. When the measurement function is learned from data, the mapping from observations to quantities is determined implicitly by the training distribution and inductive biases, allowing multiple inequivalent […]
Ordering-based Causal Discovery via Generalized Score Matching
arXiv:2601.16249v2 Announce Type: replace-cross Abstract: Learning DAG structures from purely observational data remains a long-standing challenge across scientific domains. An emerging line of research leverages the score of the data distribution to initially identify a topological order of the underlying DAG via leaf node detection and subsequently performs edge pruning for graph recovery. This paper […]