arXiv:2603.20020v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization issue in multi-layer feature fusion. Skip pathways introduce direct back-propagation paths from high-level semantic objectives to early visual layers. This mechanism overwrites low-level […]
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives
arXiv:2505.15693v3 Announce Type: replace Abstract: Recent advances in reinforcement learning (RL) have renewed interest in reward design for shaping agent behavior, but manually crafting reward functions is tedious and error-prone. A principled alternative is to specify behavioral requirements in a formal, unambiguous language and automatically compile them into learning objectives. $omega$-regular languages are a natural […]
Evaluating Game Difficulty in Tetris Block Puzzle
arXiv:2603.18994v2 Announce Type: replace Abstract: Tetris Block Puzzle is a single player stochastic puzzle in which a player places blocks on an 8 x 8 grid to complete lines; its popular variants have amassed tens of millions of downloads. Despite this reach, there is little principled assessment of which rule sets are more difficult. Inspired […]
Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning
arXiv:2506.16931v3 Announce Type: replace Abstract: Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address […]
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
arXiv:2509.24773v4 Announce Type: replace-cross Abstract: Video-conditioned audio generation, including Video-to-Sound (V2S) and Visual Text-to-Speech (VisualTTS), has traditionally been treated as distinct tasks, leaving the potential for a unified generative framework largely underexplored. In this paper, we bridge this gap with VSSFlow, a unified flow-matching framework that seamlessly solve both problems. To effectively handle multiple input […]
A Sheaf-Theoretic and Topological Perspective on Complex Network Modeling and Attention Mechanisms in Graph Neural Models
arXiv:2601.21207v2 Announce Type: replace-cross Abstract: Combinatorial and topological structures, such as graphs, simplicial complexes, and cell complexes, form the foundation of geometric and topological deep learning (GDL and TDL) architectures. These models aggregate signals over such domains, integrate local features, and generate representations for diverse real-world applications. However, the distribution and diffusion behavior of GDL […]
Beyond bouba/kiki: Multidimensional semantic signals are deeply woven into the fabric of natural language
arXiv:2603.17306v2 Announce Type: replace-cross Abstract: A foundational assumption in linguistics holds that the relationship between a word’s sound and its meaning is arbitrary. Accumulating evidence from sound symbolism challenges this view, yet no study has systematically mapped the multidimensional semantic profile of every phonological unit within a language. Here we show that individual letter-phonemes in […]
Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive
arXiv:2603.19801v1 Announce Type: cross Abstract: The increasing use of marine spaces by offshore infrastructure, including oil and gas platforms, underscores the need for consistent, scalable monitoring. Offshore development has economic, environmental, and regulatory implications, yet maritime areas remain difficult to monitor systematically due to their inaccessibility and spatial extent. This study presents an automated approach […]
Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning
arXiv:2603.19888v1 Announce Type: cross Abstract: The vast collection of machine learning records available on the web presents a significant opportunity for meta-learning, where past experiments are leveraged to improve performance. Two crucial meta-learning tasks are pipeline performance estimation (PPE), which predicts pipeline performance on target datasets, and dataset performance-based similarity estimation (DPSE), which identifies datasets […]
Promoting Critical Thinking With Domain-Specific Generative AI Provocations
arXiv:2603.19975v1 Announce Type: cross Abstract: The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-driven provocations, such as questions asking for human clarification and justification, are beneficial for eliciting critical thinking. Drawing on our experience […]
The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries
arXiv:2603.20062v1 Announce Type: cross Abstract: When a traveler asks an AI search engine to recommend a hotel, which sources get cited — and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo and document a systematic pattern we call the Intent-Source Divide. Experiential queries draw 55.9% […]
Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech
arXiv:2603.20112v1 Announce Type: cross Abstract: Personalizing Automatic Speech Recognition (ASR) for non-normative speech remains challenging because data collection is labor-intensive and model training is technically complex. To address these limitations, we propose Adapt4Me, a web-based decentralized environment that operationalizes Bayesian active learning to enable end-to-end personalization without expert supervision. The app exposes data selection, adaptation, […]