arXiv:2511.03328v1 Announce Type: cross
Abstract: A recent advancement in Multimodal Large Language Models (MLLMs) research is the emergence of “reasoning MLLMs” that offer explicit control over their internal thinking processes (normally referred as the “thinking mode”) alongside the standard “non-thinking mode”. This capability allows these models to engage in a step-by-step process of internal deliberation before generating a final response. With the rapid transition to and adoption of these “dual-state” MLLMs, this work rigorously evaluated how the enhanced reasoning processes of these MLLMs impact model performance and reliability in clinical tasks. This paper evaluates the active “thinking mode” capabilities of two leading MLLMs, Seed1.5-VL and Gemini-2.5-Flash, for medical applications. We assessed their performance on four visual medical tasks using VQA-RAD and ROCOv2 datasets. Our findings reveal that the improvement from activating the thinking mode remains marginal compared to the standard non-thinking mode for the majority of the tasks. Their performance on complex medical tasks such as open-ended VQA and medical image interpretation remains suboptimal, highlighting the need for domain-specific medical data and more advanced methods for medical knowledge integration.
Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
arXiv:2511.03549v1 Announce Type: cross Abstract: Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models



