arXiv:2512.16244v2 Announce Type: replace-cross Abstract: Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, […]
PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models
arXiv:2512.19350v1 Announce Type: new Abstract: Sycophancy, an excessive tendency of AI models to agree with user input at the expense of factual accuracy or in contradiction of visual evidence, poses a critical and underexplored challenge for multimodal large language models (MLLMs). While prior studies have examined this behavior in text-only settings of large language models, […]
On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning
arXiv:2512.19199v1 Announce Type: cross Abstract: The paper establishes generalization bounds for multitask deep neural networks using operator-theoretic techniques. The authors propose a tighter bound than those derived from conventional norm based methods by leveraging small condition numbers in the weight matrices and introducing a tailored Sobolev space as an expanded hypothesis space. This enhanced bound […]
Machine Learning of Temperature-dependent Chemical Kinetics Using Parallel Droplet Microreactors
arXiv:2512.19416v1 Announce Type: new Abstract: Temperature is a fundamental regulator of chemical and biochemical kinetics, yet capturing nonlinear thermal effects directly from experimental data remains a major challenge due to limited throughput and model flexibility. Recent advances in machine learning have enabled flexible modeling beyond conventional physical laws, but most existing strategies remain confined to […]
An Exploration of Default Images in Text-to-Image Generation
arXiv:2505.09166v5 Announce Type: replace-cross Abstract: In the creative practice of text-to-image (TTI) generation, images are synthesized from textual prompts. By design, TTI models always yield an output, even if the prompt contains unknown terms. In this case, the model may generate default images: images that closely resemble each other across many unrelated prompts. Studying default […]
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
arXiv:2512.19526v1 Announce Type: new Abstract: Understanding the physical world is essential for generalist AI agents. However, it remains unclear whether state-of-the-art vision perception models (e.g., large VLMs) can reason physical properties quantitatively. Existing evaluations are predominantly VQA-based and qualitative, offering limited insight into whether these models can infer the kinematic quantities of moving objects from […]
Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement
arXiv:2510.22860v2 Announce Type: replace-cross Abstract: Understanding how the human brain progresses from processing simple linguistic inputs to performing high-level reasoning is a fundamental challenge in neuroscience. While modern large language models (LLMs) are increasingly used to model neural responses to language, their internal representations are highly “entangled,” mixing information about lexicon, syntax, meaning, and reasoning. […]
Automated Pollen Recognition in Optical and Holographic Microscopy Images
arXiv:2512.08589v1 Announce Type: cross Abstract: This study explores the application of deep learning to improve and automate pollen grain detection and classification in both optical and holographic microscopy images, with a particular focus on veterinary cytology use cases. We used YOLOv8s for object detection and MobileNetV3L for the classification task, evaluating their performance across imaging […]
AI reasoning effort predicts human decision time in content moderation
arXiv:2508.20262v2 Announce Type: replace Abstract: Large language models can now generate intermediate reasoning steps before producing answers, improving performance on difficult problems by interactively developing solutions. This study uses a content moderation task to examine parallels between human decision times and model reasoning effort, measured using the length of the chain-of-thought (CoT). Across three frontier […]
Byzantine Fault-Tolerant Multi-Agent System for Healthcare: A Gossip Protocol Approach to Secure Medical Message Propagation
arXiv:2512.17913v1 Announce Type: cross Abstract: Recent advances in generative AI have enabled sophisticated multi-agent architectures for healthcare, where large language models power collaborative clinical decision-making. However, these distributed systems face critical challenges in ensuring message integrity and fault tolerance when operating in adversarial or untrusted environments.This paper presents a novel Byzantine fault-tolerant multi-agent system specifically […]