arXiv:2605.02080v1 Announce Type: cross Abstract: Drawing on crip theory, this paper proposes cripping AI as a guiding framework to center lived disability experiences in AI research and development. Moving beyond calls to make AI “accessible” to people with disabilities, cripping AI seeks to: (1) reveal and dismantle ableist assumptions embedded in how AI is imagined, […]
MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings
arXiv:2605.02207v1 Announce Type: cross Abstract: Pneumonia remains a leading global cause of morbidity and mortality, particularly in low resource settings where access to imaging, laboratory testing, and specialist care is limited. Clinical assessment relies on heterogeneous evidence, including symptoms, respiratory patterns, and chest imaging, making screening inherently multimodal. However, many existing computational approaches remain unimodal […]
Is It Novel and Why? Fine-Grained Patent Novelty Prediction Based on Passage Retrieval
arXiv:2605.02392v1 Announce Type: cross Abstract: Novelty assessment is a critical yet complex task in the examination process for patent acceptance, requiring examiners to determine whether an invention is disclosed in a prior art document. The process involves intricate matching between specific features of a patent claim and passages in the prior art. While prior work […]
Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation
arXiv:2605.02537v1 Announce Type: cross Abstract: Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatial constraints. Data-driven generators lack topological priors for long-horizon planning, while iterative agents fragment semantics and become geometrically brittle. We present ZoneMaestro, a unified framework that shifts the paradigm from object-centric synthesis to Zone-Graph Orchestration. By internalizing […]
mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection
arXiv:2605.02712v1 Announce Type: cross Abstract: SemEval-2026 Task 10 is focused on conspiracy detection. Specifically, the goal is to detect whether a Reddit comment expresses a conspiracy belief. Our submitted mdok-style system utilizes data augmentation and self-training (to cope with a rather small amount of training data) to finetune the Qwen3-32B model for a binary text-classification […]
Temporal and probabilistic comparisons of epidemic interventions
arXiv:2302.03210v3 Announce Type: replace Abstract: Forecasting disease spread is a critical tool to help public health officials design and plan public health interventions. However, the expected future state of an epidemic is not necessarily well defined as disease spread is inherently stochastic, contact patterns within a population are heterogeneous, and behaviors change. In this work, […]
GOAT: A Training Framework for Goal-Oriented Agent with Tools
arXiv:2510.12218v2 Announce Type: replace Abstract: Current approaches rely on zero-shot evaluation due to the absence of training data; while proprietary models such as GPT-4 exhibit strong reasoning capabilities, smaller open-source models remain ineffective at complex tool use. To address this limitation, we propose a novel training framework GOAT, that enables fine-tuning LLM agents without human […]
AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction
arXiv:2602.05353v3 Announce Type: replace Abstract: Large Language Models have shown strong capabilities in complex problem solving, yet many agentic systems remain difficult to interpret and control due to opaque internal workflows. While some frameworks offer explicit architectures for collaboration, many deployed agentic systems operate as black boxes to users. We address this by introducing Agentic […]
AI-Gram: When Visual Agents Interact in a Social Network
arXiv:2604.21446v2 Announce Type: replace Abstract: We present AI-Gram, a fully deployed, continuously operating social platform where every participant is an autonomous LLM-driven agent generating and responding to visual content. Unlike prior multi-agent simulations, AI-Gram operates as a live, AI-native social network with genuine visual perception: agents observe each other’s images, generate new images in response, […]
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
arXiv:2501.09775v3 Announce Type: replace-cross Abstract: Multiple Choice Question (MCQ) tests are among the most used methods for evaluating large language models (LLMs). Besides checking the correctness of the selected answer, evaluations often consider the model’s confidence through the probability assigned to its response. In this work, we investigate how LLM confidence is influenced by the […]
SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation
arXiv:2508.15658v5 Announce Type: replace-cross Abstract: The rapid growth of academic literature makes the manual creation of scientific surveys increasingly infeasible. While large language models show promise for automating this process, progress in this area is hindered by the absence of standardized benchmarks and evaluation protocols. To bridge this critical gap, we introduce SurGE (Survey Generation […]
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
arXiv:2510.08599v2 Announce Type: replace-cross Abstract: Pruning large pre-trained transformers in a data-scarce scenario is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whisper by 40 and retrains on 21,000 hours of speech, far beyond what is available for most languages. Can Whisper be made lighter and faster for […]