arXiv:2511.03102v1 Announce Type: cross Abstract: Mental health disorders affect millions worldwide, yet early detection remains a major challenge, particularly for Arabic-speaking populations where resources are limited and mental health discourse is often discouraged due to cultural stigma. While substantial research has focused on English-language mental health detection, Arabic remains significantly underexplored, partly due to the […]
No-Human in the Loop: Agentic Evaluation at Scale for Recommendation
arXiv:2511.03051v1 Announce Type: new Abstract: Evaluating large language models (LLMs) as judges is increasingly critical for building scalable and trustworthy evaluation pipelines. We present ScalingEval, a large-scale benchmarking study that systematically compares 36 LLMs, including GPT, Gemini, Claude, and Llama, across multiple product categories using a consensus-driven evaluation protocol. Our multi-agent framework aggregates pattern audits […]
Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response
arXiv:2511.03132v1 Announce Type: cross Abstract: This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, […]
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
arXiv:2511.03070v1 Announce Type: new Abstract: Artificial intelligence (AI) systems hold great promise for advancing various scientific disciplines, and are increasingly used in real-world applications. Despite their remarkable progress, further capabilities are expected in order to achieve more general types of intelligence. A critical distinction in this context is between factual knowledge, which can be evaluated […]
A Quantized VAE-MLP Botnet Detection Model: A Systematic Evaluation of Quantization-Aware Training and Post-Training Quantization Strategies
arXiv:2511.03201v1 Announce Type: cross Abstract: In an effort to counter the increasing IoT botnet-based attacks, state-of-the-art deep learning methods have been proposed and have achieved impressive detection accuracy. However, their computational intensity restricts deployment on resource-constrained IoT devices, creating a critical need for lightweight detection models. A common solution to this challenge is model compression […]
A Roadmap for Predictive Human Immunology
arXiv:2511.03041v1 Announce Type: new Abstract: For over a century, immunology has masterfully discovered and dissected the components of our immune system, yet its collective behavior remains fundamentally unpredictable. In this perspective, we argue that building on the learnings of reductionist biology and systems immunology, the field is poised for a third revolution. This new era […]
Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature
arXiv:2511.03261v1 Announce Type: cross Abstract: Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study […]
PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework
arXiv:2511.03023v1 Announce Type: new Abstract: Open data repositories hold potential for evidence-based decision-making, yet are inaccessible to non-experts lacking expertise in dataset discovery, schema mapping, and statistical analysis. Large language models show promise for individual tasks, but end-to-end analytical workflows expose fundamental limitations: attention dilutes across growing contexts, specialized reasoning patterns interfere, and errors propagate […]
Morpho-Genomic Deep Learning for Ovarian Cancer Subtype and Gene Mutation Prediction from Histopathology
arXiv:2511.03365v1 Announce Type: cross Abstract: Ovarian cancer remains one of the most lethal gynecological malignancies, largely due to late diagnosis and extensive heterogeneity across subtypes. Current diagnostic methods are limited in their ability to reveal underlying genomic variations essential for precision oncology. This study introduces a novel hybrid deep learning pipeline that integrates quantitative nuclear […]
Evaluating Control Protocols for Untrusted AI Agents
arXiv:2511.02997v1 Announce Type: new Abstract: As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to mitigating the risk from untrusted AI agents by monitoring their actions and intervening or auditing when necessary. Evaluating the safety of these protocols requires understanding both their […]