GDS Agent for Graph Algorithmic Reasoning

arXiv:2508.20637v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have shown remarkable multimodal information processing and reasoning ability. When equipped with tools through function calling and enhanced with retrieval-augmented techniques, compound LLM-based systems can access closed data sources and answer questions about them. However, they still struggle to process and reason over large-scale graph-structure data. […]

NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

arXiv:2511.02888v1 Announce Type: new Abstract: Nucleotide sequence variation can induce significant shifts in functional fitness. Recent nucleotide foundation models promise to predict such fitness effects directly from sequence, yet heterogeneous datasets and inconsistent preprocessing make it difficult to compare methods fairly across DNA and RNA families. Here we introduce NABench, a large-scale, systematic benchmark for […]

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

arXiv:2511.03092v1 Announce Type: new Abstract: The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches. Techniques such as StreamingLLM and SnapKV demonstrate how to control KV cache size while maintaining model accuracy. Yet, these techniques are not commonly […]

CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic

arXiv:2511.03102v1 Announce Type: cross Abstract: Mental health disorders affect millions worldwide, yet early detection remains a major challenge, particularly for Arabic-speaking populations where resources are limited and mental health discourse is often discouraged due to cultural stigma. While substantial research has focused on English-language mental health detection, Arabic remains significantly underexplored, partly due to the […]

No-Human in the Loop: Agentic Evaluation at Scale for Recommendation

arXiv:2511.03051v1 Announce Type: new Abstract: Evaluating large language models (LLMs) as judges is increasingly critical for building scalable and trustworthy evaluation pipelines. We present ScalingEval, a large-scale benchmarking study that systematically compares 36 LLMs, including GPT, Gemini, Claude, and Llama, across multiple product categories using a consensus-driven evaluation protocol. Our multi-agent framework aggregates pattern audits […]

Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response

arXiv:2511.03132v1 Announce Type: cross Abstract: This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, […]

Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge

arXiv:2511.03070v1 Announce Type: new Abstract: Artificial intelligence (AI) systems hold great promise for advancing various scientific disciplines, and are increasingly used in real-world applications. Despite their remarkable progress, further capabilities are expected in order to achieve more general types of intelligence. A critical distinction in this context is between factual knowledge, which can be evaluated […]

A Quantized VAE-MLP Botnet Detection Model: A Systematic Evaluation of Quantization-Aware Training and Post-Training Quantization Strategies

arXiv:2511.03201v1 Announce Type: cross Abstract: In an effort to counter the increasing IoT botnet-based attacks, state-of-the-art deep learning methods have been proposed and have achieved impressive detection accuracy. However, their computational intensity restricts deployment on resource-constrained IoT devices, creating a critical need for lightweight detection models. A common solution to this challenge is model compression […]

A Roadmap for Predictive Human Immunology

arXiv:2511.03041v1 Announce Type: new Abstract: For over a century, immunology has masterfully discovered and dissected the components of our immune system, yet its collective behavior remains fundamentally unpredictable. In this perspective, we argue that building on the learnings of reductionist biology and systems immunology, the field is poised for a third revolution. This new era […]

Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature

arXiv:2511.03261v1 Announce Type: cross Abstract: Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844