arXiv:2506.09082v4 Announce Type: replace-cross Abstract: The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align […]
Generative Data Transformation: From Mixed to Unified Data
arXiv:2602.22743v2 Announce Type: replace Abstract: Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains to enrich information within the target domain. However, inherent domain gaps can degrade the quality […]
When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming
arXiv:2511.22302v2 Announce Type: replace Abstract: Numerical simulations have revolutionized the industrial design process by reducing prototyping costs, design iterations, and enabling product engineers to explore the design space more efficiently. However, the growing scale of simulations demands substantial expert knowledge, computational resources, and time. A key challenge is identifying input parameters that yield optimal results, […]
Perfecting Human-AI Interaction at Clinical Scale. Turning Production Signals into Safer, More Human Conversations
arXiv:2603.29893v1 Announce Type: cross Abstract: Healthcare conversational AI agents shouldn’t be optimized only for clean benchmark accuracy in production-first regime; they must be optimized for the lived reality of patient conversations, where audio is imperfect, intent is indirect, language shifts mid-call, and compliance hinges on how guidance is delivered. We present a production-validated framework grounded […]
Tucker Attention: A generalization of approximate attention mechanisms
arXiv:2603.30033v1 Announce Type: cross Abstract: The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attention heads. From the point of view of classical […]
Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling
arXiv:2510.03638v4 Announce Type: replace-cross Abstract: Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that […]
EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context
arXiv:2407.04472v4 Announce Type: replace-cross Abstract: Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises […]
ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models
arXiv:2510.13860v2 Announce Type: replace-cross Abstract: While the transformer architecture has achieved state-of-the-art performance on natural language processing tasks, these models impose substantial memory and computational overhead. Recent research has identified significant architectural redundancies within these models, particularly in the attention sub-layers in the top layers, presenting opportunities for optimization without compromising performance. Taking insights from […]
Evidential Neural Radiance Fields
arXiv:2602.23574v2 Announce Type: replace-cross Abstract: Understanding sources of uncertainty is fundamental to trustworthy three-dimensional scene modeling. While recent advances in neural radiance fields (NeRFs) achieve impressive accuracy in scene reconstruction and novel view synthesis, the lack of uncertainty estimation significantly limits their deployment in safety-critical settings. Existing uncertainty quantification methods for NeRFs fail to separately […]
Building evidence-based knowledge graphs from full-text literature for disease-specific biomedical reasoning
arXiv:2603.28325v2 Announce Type: replace-cross Abstract: Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a framework and dataset for building disease-specific knowledge graphs from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to […]
VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing
arXiv:2603.29852v1 Announce Type: cross Abstract: We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task […]
MindCube: Spatial Mental Modeling from Limited Views
arXiv:2506.21458v2 Announce Type: replace Abstract: Can Vision-Language Models (VLMs) imagine the full scene from just a few views, like humans do? Humans form spatial mental models naturally, internal representations of unseen space, to reason about layout, perspective, and motion. Our MindCube benchmark with 21,154 questions across 3,268 images exposes this critical gap, where existing VLMs […]