arXiv:2511.01213v1 Announce Type: cross Abstract: The immense diversity in the culture and culinary of Indian cuisines calls attention to the major shortcoming of the existing Visual Question Answering(VQA) systems which are inclined towards the foods from Western region. Recent attempt towards building a VQA dataset for Indian food is a step towards addressing this challenge. […]
AmpliconHunter2: a SIMD-Accelerated In-Silico PCR Engine
arXiv:2511.00170v1 Announce Type: new Abstract: Summary: We present AmpliconHunter2 (AHv2), a highly scalable in silico PCR engine written in C that can handle degenerate primers and uses a highly accurate melting temperature model. AHv2 implements a bit-mask IUPAC matcher with AVX2 SIMD acceleration, supports user-specified mismatches and 3′ clamp constraints, calls amplicons in all four […]
Embodied Cognition Augmented End2End Autonomous Driving
arXiv:2511.01334v1 Announce Type: cross Abstract: In recent years, vision-based end-to-end autonomous driving has emerged as a new paradigm. However, popular end-to-end approaches typically rely on visual feature extraction networks trained under label supervision. This limited supervision framework restricts the generality and applicability of driving models. In this paper, we propose a novel paradigm termed $E^3AD$, […]
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
arXiv:2511.01831v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) suffer from catastrophic forgetting when sequentially fine-tuned on new tasks, degrading performance on previously learned foundational and task-specific capabilities. While multi-task learning can mitigate forgetting, it requires simultaneous access to all datasets and imposes computational overhead that scales linearly with the number of tasks. In this work, […]
LongCat-Flash-Omni Technical Report
arXiv:2511.00279v1 Announce Type: cross Abstract: We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspired progressive training strategy that transitions from simpler to increasingly complex modality sequence modeling tasks, LongCat-Flash-Omni attains comprehensive multimodal capabilities while maintaining strong unimodal capability. Building upon LongCat-Flash, which adopts […]
Incremental Selection of Most-Filtering Conjectures and Proofs of the Selected Conjectures
arXiv:2511.00194v1 Announce Type: new Abstract: We present an improved incremental selection algorithm of the selection algorithm presented in [1] and prove all the selected conjectures.
Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks
arXiv:2511.00346v1 Announce Type: cross Abstract: The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this work, we propose a novel approach to crafting universal jailbreaks and data extraction attacks by exploiting latent space discontinuities, an architectural vulnerability related to the sparsity of training data. Unlike […]
Scientists’ First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
arXiv:2506.10521v5 Announce Type: replace Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on evaluating the knowledge understanding […]
SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
arXiv:2511.00392v1 Announce Type: cross Abstract: Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant […]
Advancing Cognitive Science with LLMs
arXiv:2511.00206v1 Announce Type: new Abstract: Cognitive science faces ongoing challenges in knowledge synthesis and conceptual clarity, in part due to its multifaceted and interdisciplinary nature. Recent advances in artificial intelligence, particularly the development of large language models (LLMs), offer tools that may help to address these issues. This review examines how LLMs can support areas […]