arXiv:2512.15736v1 Announce Type: new Abstract: We present Anubuddhi, a multi-agent AI system that designs and simulates quantum optics experiments from natural language prompts without requiring specialized programming knowledge. The system composes optical layouts by arranging components from a three-tier toolbox via semantic retrieval, then validates designs through physics simulation with convergent refinement. The architecture combines […]
Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
arXiv:2512.14687v2 Announce Type: replace-cross Abstract: Recent audio language models can follow long conversations. However, research on emotion-aware or spoken dialogue summarization is constrained by the lack of data that links speech, summaries, and paralinguistic cues. We introduce Spoken DialogSum, the first corpus aligning raw conversational audio with factual summaries, emotion-rich summaries, and utterance-level labels for […]
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications
arXiv:2503.20990v3 Announce Type: replace-cross Abstract: Audio Large Language Models (AudioLLMs) have received widespread attention and have significantly improved performance on audio tasks such as conversation, audio understanding, and automatic speech recognition (ASR). Despite these advancements, there is an absence of a benchmark for assessing AudioLLMs in financial scenarios, where audio data, such as earnings conference […]
MindShift: Analyzing Language Models’ Reactions to Psychological Prompts
arXiv:2512.09149v2 Announce Type: replace-cross Abstract: Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users. In our study, we investigated this potential using robust psychometric measures. We adapted the most studied test in psychological literature, namely Minnesota Multiphasic Personality Inventory (MMPI) and examined LLMs’ behavior to identify […]
Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
arXiv:2507.00788v2 Announce Type: replace-cross Abstract: [Context] AI assistants, like GitHub Copilot and Cursor, are transforming software engineering. While several studies highlight productivity improvements, their impact on maintainability requires further investigation. [Objective] This study investigates whether co-development with AI assistants affects software maintainability, specifically how easily other developers can evolve the resulting source code. [Method] We […]
PixelArena: A benchmark for Pixel-Precision Visual Intelligence
arXiv:2512.16303v1 Announce Type: cross Abstract: Multi-modal large language models that have image output are emerging. Many image generation benchmarks focus on aesthetics instead of fine-grained generation capabilities. In PixelArena, we propose using semantic segmentation tasks to objectively examine their fine-grained generative intelligence with pixel precision. We find the latest Gemini 3 Pro Image has emergent […]
Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging
arXiv:2512.08333v2 Announce Type: replace-cross Abstract: Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations […]
TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge
arXiv:2512.16855v1 Announce Type: new Abstract: Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language […]
UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era
arXiv:2512.13089v2 Announce Type: replace-cross Abstract: Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring. Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs; they typically focus on a few predefined categories and generalize poorly to diverse scenes. […]
TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering
arXiv:2512.16270v1 Announce Type: cross Abstract: Text rendering has recently emerged as one of the most challenging frontiers in visual generation, drawing significant attention from large-scale diffusion and multimodal models. However, text editing within images remains largely unexplored, as it requires generating legible characters while preserving semantic, geometric, and contextual coherence. To fill this gap, we […]