arXiv:2604.03057v1 Announce Type: cross Abstract: This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains an LLM to generate executable queries To support this capability we introduce a principled pipeline […]
A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
arXiv:2604.03094v1 Announce Type: cross Abstract: Accurate and automated sea ice classification is important for climate monitoring and maritime safety in the Arctic. While Synthetic Aperture Radar (SAR) is the operational standard because of its all-weather capability, it remains challenging to distinguish morphologically similar ice classes under severe class imbalance. Rather than claiming a fully validated […]
An Independent Safety Evaluation of Kimi K2.5
arXiv:2604.03121v1 Announce Type: cross Abstract: Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying safety evaluation. In this work, we conduct a preliminary safety assessment of Kimi K2.5 focusing on risks likely to be exacerbated by powerful open-weight models. Specifically, we evaluate […]
InCoder-32B-Thinking: Industrial Code World Model for Thinking
arXiv:2604.03144v1 Announce Type: cross Abstract: Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics. In this work, we propose InCoder-32B-Thinking, trained on the data from the Error-driven Chain-of-Thought (ECoT) synthesis framework with an industrial code world model (ICWM) to […]
Reflective Context Learning: Studying the Optimization Primitives of Context Space
arXiv:2604.03189v1 Announce Type: cross Abstract: Generally capable agents must learn from experience in ways that generalize across tasks and environments. The fundamental problems of learning, including credit assignment, overfitting, forgetting, local optima, and high-variance learning signals, persist whether the learned object lies in parameter space or context space. While these challenges are well understood in […]
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior
arXiv:2509.10078v3 Announce Type: replace-cross Abstract: Psychological profiling of large language models (LLMs) using psychometric questionnaires designed for humans has become widespread. However, it remains unclear whether the resulting profiles mirror the models’ psychological characteristics expressed during their real-world interactions with users. To examine the risk of human questionnaires mischaracterizing LLM psychology, we compare two types […]
Integrated representational signatures strengthen specificity in brains and models
arXiv:2510.20847v2 Announce Type: replace Abstract: The extent to which different neural or artificial neural networks (models) rely on equivalent representations to support similar tasks remains a central question in neuroscience and machine learning. Prior work has typically compared systems using a single representational similarity metric, yet each captures only one facet of representational structure. To […]
ClinicalReTrial: Clinical Trial Redesign with Self-Evolving Agents
arXiv:2601.00290v2 Announce Type: replace Abstract: Clinical trials constitute a critical yet exceptionally challenging and costly stage of drug development ($2.6B per drug), where protocols are encoded as complex natural language documents, motivating the use of AI systems beyond manual analysis. Existing AI methods accurately predict trial failure, but do not provide actionable remedies. To fill […]
From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving
arXiv:2603.17714v2 Announce Type: replace Abstract: Autonomous driving technologies have achieved significant advances in recent years, yet their real-world deployment remains constrained by data scarcity, safety requirements, and the need for generalization across diverse environments. In response, synthetic data and virtual environments have emerged as powerful enablers, offering scalable, controllable, and richly annotated scenarios for training […]
Experience as a Compass: Multi-agent RAG with Evolving Orchestration and Agent Prompts
arXiv:2604.00901v2 Announce Type: replace Abstract: Multi-agent Retrieval-Augmented Generation (RAG), wherein each agent takes on a specific role, supports hard queries that require multiple steps and sources, or complex reasoning. Existing approaches, however, rely on static agent behaviors and fixed orchestration strategies, leading to brittle performance on diverse, multi-hop tasks. We identify two key limitations: the […]
Solving the Two-dimensional single stock size Cutting Stock Problem with SAT and MaxSAT
arXiv:2604.01732v2 Announce Type: replace Abstract: Cutting rectangular items from stock sheets to satisfy demands while minimizing waste is a central manufacturing task. The Two-Dimensional Single Stock Size Cutting Stock Problem (2D-CSSP) generalizes bin packing by requiring multiple copies of each item type, which causes a strong combinatorial blow-up. We present a SAT-based framework where item […]
ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
arXiv:2410.10238v3 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level […]