arXiv:2603.28696v1 Announce Type: cross Abstract: Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clips and (ii) stop processing once […]
AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems
arXiv:2602.11510v2 Announce Type: replace Abstract: Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent messages, shared memory, and tool arguments, all pathways that output-only audits never inspect. We introduce AgentLeak, to the best of our knowledge the first full-stack benchmark […]
Class-Imbalanced-Aware Adaptive Dataset Distillation for Scalable Pretrained Model on Credit Scoring
arXiv:2501.10677v3 Announce Type: replace-cross Abstract: The advent of artificial intelligence has significantly enhanced credit scoring technologies. Despite the remarkable efficacy of advanced deep learning models, mainstream adoption continues to favor tree-structured models due to their robust predictive performance on tabular data. Although pretrained models have seen considerable development, their application within the financial realm predominantly […]
Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using a GPT-Based VLM: A Preliminary Study on Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework
arXiv:2510.02001v5 Announce Type: replace-cross Abstract: Vision-language models (VLMs) such as GPT (Generative Pre-Trained Transformer) have shown potential for medical image interpretation; however, challenges remain in generating reliable radiological findings in clinical practice, as exemplified by dental pathologies. This study proposes a Self-correction Loop with Structured Output (SLSO) framework as an integrated processing methodology to enhance […]
Single-Round Scalable Analytic Federated Learning
arXiv:2512.03336v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is plagued by two key challenges: high communication overhead and performance collapse on heterogeneous (non-IID) data. Analytic FL (AFL) provides a single-round, data distribution invariant solution, but is limited to linear models. Subsequent non-linear approaches, like DeepAFL, regain accuracy but sacrifice the single-round benefit. In this work, […]
Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
arXiv:2601.06932v4 Announce Type: replace-cross Abstract: Matching place names across writing systems is a persistent obstacle to the integration of multilingual geographic sources, whether modern gazetteers, medieval itineraries, or colonial-era surveys. Existing approaches depend on language-specific phonetic algorithms or romanisation steps that discard phonetic information, and none generalises across script boundaries. This paper presents Symphonym, a […]
Declarative Scenario-based Testing with RoadLogic
arXiv:2603.09455v2 Announce Type: replace-cross Abstract: Scenario-based testing is a key method for cost-effective and safe validation of autonomous vehicles (AVs). Existing approaches rely on imperative scenario definitions, requiring developers to manually enumerate numerous variants to achieve coverage. Declarative languages, such as ASAM OpenSCENARIO DSL (OS2), raise the abstraction level but lack systematic methods for instantiating […]
Vega: Learning to Drive with Natural Language Instructions
arXiv:2603.25741v2 Announce Type: replace-cross Abstract: Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) […]
Designing AI for Real Users — Accessibility Gaps in Retail AI Front-End
arXiv:2603.28196v1 Announce Type: cross Abstract: As AI becomes embedded in customer-facing systems, ethical scrutiny has largely focused on models, data, and governance. Far less attention has been paid to how AI is experienced through user-facing design. This commentary argues that many AI front-ends implicitly assume an ‘ideal user body and mind’, and that this becomes […]
Integrating Multimodal Large Language Model Knowledge into Amodal Completion
arXiv:2603.28333v1 Announce Type: cross Abstract: With the widespread adoption of autonomous vehicles and robotics, amodal completion, which reconstructs the occluded parts of people and objects in an image, has become increasingly crucial. Just as humans infer hidden regions based on prior experience and common sense, this task inherently requires physical knowledge about real-world entities. However, […]
Will a time-varying complex system be stable?
arXiv:2603.28464v1 Announce Type: cross Abstract: Randomly-assembled dynamical systems are theoretically predicted to be unstable upon crossing a critical threshold of complexity, as first shown by May. Yet, empirical complex systems exhibit remarkable stability, indicating the presence of additional mechanisms playing a stabilizing role. The relation between complexity and stability is typically assessed by assuming fixed […]
Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection
arXiv:2603.28596v1 Announce Type: cross Abstract: Reflective writing is known to support the development of students’ metacognitive skills, yet learners often struggle to engage in deep reflection, limiting learning gains. Although large language models (LLMs) have been shown to improve writing skills, their use as conversational agents for reflective writing has produced mixed results and has […]