PurposeTo evaluate three large language models (LLMs), including ChatGPT 5, ChatGPT 4o, and ChatGPT 3.5, in automating TNM staging from PET-CT reports across six cancer types, and to assess their clinical utility compared with junior radiologists.Materials and methodsPET-CT reports from 552 treatment-naive patients in two institutions with confirmed primary malignancies (lung, breast, liver, pancreatic, renal, […]
Short-form video platforms as a source of ankylosing spondylitis information: a cross-sectional content analysis
BackgroundShort-video platforms have become major channels for health information dissemination, yet the quality and reliability of content on ankylosing spondylitis (AS) remain underexplored.ObjectiveThis study aimed to systematically evaluate the quality, reliability, and characteristics of AS-related short videos on three major Chinese platforms: TikTok, Bilibili, and rednote.MethodsA cross-sectional content analysis was conducted on 300 videos (100 […]
Skyhawk taps Teva alum to steer commercial path, while Santhera names new CCO to grow DMD sales
Skyhawk Therapeutics has named Aaron Deves as chief commercial officer, securing the expertise of an executive who oversaw products including Austedo while at Teva. The company shared the news the same day that Santhera Pharmaceuticals revealed it has hired a new chief commercial officer, too.
Middle East Conflict Highlights Cloud Resilience Gaps
Data centers — used by both governments and militaries for operations — are now fair game, not just for cyberattacks, but for kinetic attacks as well.
EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation
arXiv:2603.08759v1 Announce Type: cross Abstract: Music structure segmentation is a key task in audio analysis, but existing models perform poorly on Electronic Dance Music (EDM). This problem exists because most approaches rely on lyrical or harmonic similarity, which works well for pop music but not for EDM. EDM structure is instead defined by changes in […]
A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology
arXiv:2603.08844v1 Announce Type: cross Abstract: Accurate localization of tumor regions from hematoxylin and eosin-stained whole-slide images is fundamental for translational research including spatial analysis, molecular profiling, and tissue architecture investigation. However, deep learning-based tumor detection trained within specific cancers may exhibit reduced robustness when applied across different tumor types. We investigated whether balanced training across […]
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
arXiv:2603.08713v1 Announce Type: cross Abstract: Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive due to its favorable hardware efficiency, but its 4-bit variant (MXFP4) lags behind NVIDIA’s NVFP4 in accuracy, limiting adoption. We introduce two software-only techniques, […]
Automating Forecasting Question Generation and Resolution for AI Evaluation
arXiv:2601.22444v2 Announce Type: replace-cross Abstract: Forecasting future events is highly valuable in decision-making and is a robust measure of general intelligence. As forecasting is probabilistic, developing and evaluating AI forecasters requires generating large numbers of diverse and difficult questions, and accurately resolving them. Previous efforts to automate this laborious work relied on recurring data sources […]
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
arXiv:2505.17862v2 Announce Type: replace Abstract: Recent Multimodal Large Language Models (MLLMs) achieve promising performance on visual and audio benchmarks independently. However, the ability of these models to process cross-modal information synchronously remains largely unexplored. We introduce Daily-Omni, a multiple-choice Audio-Visual QA benchmark featuring 684 real-world videos and 1,197 questions spanning 6 task families that explicitly […]
GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation
arXiv:2603.09675v1 Announce Type: cross Abstract: There is growing interest in applying graph-based methods to Time Series Anomaly Detection (TSAD), particularly Graph Neural Networks (GNNs), as they naturally model dependencies among multivariate signals. GNNs are typically used as backbones in score-based TSAD pipelines, where anomalies are identified through reconstruction or prediction errors followed by thresholding. However, […]
SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases
arXiv:2603.09853v1 Announce Type: cross Abstract: Advances in large language models (LLMs) have enabled significant capabilities in audio processing, resulting in state-of-the-art models now known as Large Audio Language Models (LALMs). However, minimal work has been done to measure audio understanding beyond automatic speech recognition (ASR). This paper closes that gap by proposing a benchmark suite, […]
Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
arXiv:2602.09987v3 Announce Type: replace-cross Abstract: Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion […]