arXiv:2510.26115v1 Announce Type: cross Abstract: We introduce a general diploid population model with self-fertilization and possible overlapping generations, and study the genealogy of a sample of $n$ genes as the population size $N$ tends to infinity. Unlike traditional approach in coalescent theory which considers the unconditional (annealed) law of the gene genealogies averaged over the […]
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
arXiv:2510.26238v1 Announce Type: new Abstract: Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their […]
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
arXiv:2510.26130v1 Announce Type: cross Abstract: Large language models (LLMs) have advanced code generation at the function level, yet their ability to produce correct class-level implementations in authentic software projects remains poorly understood. This work introduces a novel benchmark derived from open-source repositories, comprising real-world classes divided into seen and unseen partitions to evaluate generalization under […]
A Continuous and Interpretable Morphometric for Robust Quantification of Dynamic Biological Shapes
arXiv:2410.21004v2 Announce Type: replace-cross Abstract: We introduce the Push-Forward Signed Distance Morphometric (PF-SDM) for shape quantification in biomedical imaging. The PF-SDM compactly encodes geometric and topological properties of closed shapes, including their skeleton and symmetries. This provides robust and interpretable features for shape comparison and machine learning. The PF-SDM is mathematically smooth, providing access to […]
Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
arXiv:2510.26157v1 Announce Type: cross Abstract: Molecule and text representation learning has gained increasing interest due to its potential for enhancing the understanding of chemical information. However, existing models often struggle to capture subtle differences between molecules and their descriptions, as they lack the ability to learn fine-grained alignments between molecular substructures and chemical phrases. To […]
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
arXiv:2507.13328v2 Announce Type: replace-cross Abstract: Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differences, both behaviorally and representationally. In this work, we start from the hypothesis that the domain in which VL training could have a significant effect is […]
StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations
arXiv:2505.18766v2 Announce Type: replace-cross Abstract: Recently, text-to-image diffusion models have been widely used for style mimicry and personalized customization through methods such as DreamBooth and Textual Inversion. This has raised concerns about intellectual property protection and the generation of deceptive content. Recent studies, such as Glaze and Anti-DreamBooth, have proposed using adversarial noise to protect […]
More of the Same: Persistent Representational Harms Under Increased Representation
arXiv:2503.00333v2 Announce Type: replace-cross Abstract: To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been […]
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
arXiv:2502.09969v4 Announce Type: replace-cross Abstract: Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate the influence of data using language models, which do not scale well with large models and datasets. This is because […]
Clone Deterministic 3D Worlds with Geometrically-Regularized World Models
arXiv:2510.26782v1 Announce Type: cross Abstract: A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. Despite rapid […]