Health System Scale Semantic Search Across Unstructured Clinical Notes

arXiv:2604.25605v1 Announce Type: cross Abstract: Introduction: Semantic search, which retrieves documents based on conceptual similarity rather than keyword matching, offers substantial advantages for retrieval of clinical information. However, deploying semantic search across entire health systems, comprising hundreds of millions of clinical notes, presents formidable engineering, cost, and governance challenges that have prevented adoption. Methods: We […]

An Investigation of Linguistic Biases in LLM-Based Recommendations

arXiv:2604.25456v1 Announce Type: cross Abstract: We investigate linguistic biases in LLM-based restaurant and product recommendations given prompts varying across Southern American English (AE), Indian English (IE), and Code-Switched Hindi-English dialects, using the Yelp Open dataset (Yelp Inc., 2023) and Walmart product reviews dataset (PromptCloud,2020). We add lists of restaurant and product names balanced by cuisine […]

Can We Change the Stroke Size for Easier Diffusion?

arXiv:2603.26783v2 Announce Type: replace-cross Abstract: Diffusion models can be challenged in the low signal-to-noise regime, where they have to make pixel-level predictions despite the presence of high noise. The geometric intuition is akin to using the finest stroke for oil painting throughout, which may be ineffective. We therefore study stroke-size control as a controlled intervention […]

Toward a Functional Geometric Algebra for Natural Language Semantics

arXiv:2604.25902v1 Announce Type: cross Abstract: Distributional and neural approaches to natural language semantics have been built almost exclusively on conventional linear algebra: vectors, matrices, tensors, and the operations that accompany them. These methods have achieved remarkable empirical success, yet they face persistent structural limitations in compositional semantics, type sensitivity, and interpretability. I argue in this […]

Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment

arXiv:2604.25779v1 Announce Type: cross Abstract: In the MNIST auxiliary logit distillation experiment, a student can acquire an unintended teacher trait despite distilling only on no-class logits through a phenomenon called subliminal learning. Under a single-step gradient descent assumption, subliminal learning theory attributes this effect to alignment between the trait and distillation gradients, but does not […]

FARM: Enhancing Molecular Representations with Functional Group Awareness

arXiv:2410.02082v4 Announce Type: replace-cross Abstract: We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key idea behind FARM is the incorporation of functional group (FG) annotations at the atomic level, enabling both FG-enhanced SMILES and FG graphs. In […]

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

arXiv:2604.25318v1 Announce Type: cross Abstract: Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks […]

Language corpora for the Dutch medical domain

arXiv:2604.25374v1 Announce Type: cross Abstract: textbfBackground: Dutch medical corpora are scarce, limiting NLP development. \ textbfMethods: We translated English datasets, identified medical text in generic corpora, and extracted open Dutch medical resources. \ textbfResults: The resulting corpus comprises $pm$ 35 billion tokens across the medical domain in about 100 million documents, freely available on Hugging […]

Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems

arXiv:2604.25544v1 Announce Type: cross Abstract: Deploying an intrusion detector trained in one industrial plant to another remains difficult because Industrial Control System (ICS) traffic is highly site-dependent, labels are scarce, and unseen attacks often appear after deployment. To address this challenge, this paper introduces a medoid prototype alignment framework for cross-plant unknown attack detection. Instead […]

Compute Aligned Training: Optimizing for Test Time Inference

arXiv:2604.24957v1 Announce Type: cross Abstract: Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a base policy, creating a misalignment with test time procedures that rely on aggregated or filtered […]

Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

arXiv:2604.25067v1 Announce Type: cross Abstract: Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals for recursive self-improvement. We propose measuring AI’s capability to autonomously implement end-to-end machine learning pipelines from past […]

UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval

arXiv:2604.25142v1 Announce Type: cross Abstract: Unsupervised domain adaptation generalizes neural retrievers to an unseen domain by generating pseudo queries on target domain documents. The quality and efficiency of this adaptation critically depend on which documents are selected for pseudo query generation. The existing document sampling method focuses on diversity but fails to capture model uncertainty. […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844