Evaluating large language models for automated TNM staging from PET-CT reports: a multi-cancer comparative study

PurposeTo evaluate three large language models (LLMs), including ChatGPT 5, ChatGPT 4o, and ChatGPT 3.5, in automating TNM staging from PET-CT reports across six cancer types, and to assess their clinical utility compared with junior radiologists.Materials and methodsPET-CT reports from 552 treatment-naive patients in two institutions with confirmed primary malignancies (lung, breast, liver, pancreatic, renal, […]

Short-form video platforms as a source of ankylosing spondylitis information: a cross-sectional content analysis

BackgroundShort-video platforms have become major channels for health information dissemination, yet the quality and reliability of content on ankylosing spondylitis (AS) remain underexplored.ObjectiveThis study aimed to systematically evaluate the quality, reliability, and characteristics of AS-related short videos on three major Chinese platforms: TikTok, Bilibili, and rednote.MethodsA cross-sectional content analysis was conducted on 300 videos (100 […]

A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology

arXiv:2603.08844v1 Announce Type: cross Abstract: Accurate localization of tumor regions from hematoxylin and eosin-stained whole-slide images is fundamental for translational research including spatial analysis, molecular profiling, and tissue architecture investigation. However, deep learning-based tumor detection trained within specific cancers may exhibit reduced robustness when applied across different tumor types. We investigated whether balanced training across […]

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

arXiv:2603.08713v1 Announce Type: cross Abstract: Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive due to its favorable hardware efficiency, but its 4-bit variant (MXFP4) lags behind NVIDIA’s NVFP4 in accuracy, limiting adoption. We introduce two software-only techniques, […]

Automating Forecasting Question Generation and Resolution for AI Evaluation

arXiv:2601.22444v2 Announce Type: replace-cross Abstract: Forecasting future events is highly valuable in decision-making and is a robust measure of general intelligence. As forecasting is probabilistic, developing and evaluating AI forecasters requires generating large numbers of diverse and difficult questions, and accurately resolving them. Previous efforts to automate this laborious work relied on recurring data sources […]

Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities

arXiv:2505.17862v2 Announce Type: replace Abstract: Recent Multimodal Large Language Models (MLLMs) achieve promising performance on visual and audio benchmarks independently. However, the ability of these models to process cross-modal information synchronously remains largely unexplored. We introduce Daily-Omni, a multiple-choice Audio-Visual QA benchmark featuring 684 real-world videos and 1,197 questions spanning 6 task families that explicitly […]

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

arXiv:2603.09675v1 Announce Type: cross Abstract: There is growing interest in applying graph-based methods to Time Series Anomaly Detection (TSAD), particularly Graph Neural Networks (GNNs), as they naturally model dependencies among multivariate signals. GNNs are typically used as backbones in score-based TSAD pipelines, where anomalies are identified through reconstruction or prediction errors followed by thresholding. However, […]

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

arXiv:2603.09853v1 Announce Type: cross Abstract: Advances in large language models (LLMs) have enabled significant capabilities in audio processing, resulting in state-of-the-art models now known as Large Audio Language Models (LALMs). However, minimal work has been done to measure audio understanding beyond automatic speech recognition (ASR). This paper closes that gap by proposing a benchmark suite, […]

Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

arXiv:2602.09987v3 Announce Type: replace-cross Abstract: Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844