Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?

arXiv:2511.17400v1 Announce Type: cross Abstract: Vision Transformers ($textViTs$) have become the backbone of vision foundation models, yet their optimization for multi-channel domains – such as cell painting or satellite imagery – remains underexplored. A key challenge in these domains is capturing interactions between channels, as each channel carries different information. While existing works have shown […]

VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference

arXiv:2511.16449v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have shown great promise for embodied AI, yet the heavy computational cost of processing continuous visual streams severely limits their real-time deployment. Token pruning (keeping salient visual tokens and dropping redundant ones) has emerged as an effective approach for accelerating Vision-Language Models (VLMs), offering a solution for […]

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

arXiv:2511.16544v2 Announce Type: replace-cross Abstract: As Automatic Speech Recognition (ASR) is increasingly deployed in clinical dialogue, standard evaluations still rely heavily on Word Error Rate (WER). This paper challenges that standard, investigating whether WER or other common metrics correlate with the clinical impact of transcription errors. We establish a gold-standard benchmark by having expert clinicians […]

Labeled histories and maximally probable labeled topologies with multifurcation

arXiv:2511.16799v1 Announce Type: new Abstract: In mathematical phylogenetics, labeled histories describe the sequences by which sets of labeled lineages coalesce to a shared ancestral lineage. We study labeled histories for at-most-$r$-furcating trees. Consider a rooted leaf-labeled tree in which internal nodes each have $i$ offspring, and $i$ is permitted to range from 2 to $r$ […]

MF-GCN: A Multi-Frequency Graph Convolutional Network for Tri-Modal Depression Detection Using Eye-Tracking, Facial, and Acoustic Features

arXiv:2511.15675v2 Announce Type: replace-cross Abstract: Depression is a prevalent global mental health disorder, characterised by persistent low mood and anhedonia. However, it remains underdiagnosed because current diagnostic methods depend heavily on subjective clinical assessments. To enable objective detection, we introduce a gold standard dataset of 103 clinically assessed participants collected through a tripartite data approach […]

Designing and Generating Diverse, Equitable Face Image Datasets for Face Verification Tasks

arXiv:2511.17393v1 Announce Type: cross Abstract: Face verification is a significant component of identity authentication in various applications including online banking and secure access to personal devices. The majority of the existing face image datasets often suffer from notable biases related to race, gender, and other demographic characteristics, limiting the effectiveness and fairness of face verification […]

The promise and limits of LLMs in constructing proofs and hints for logic problems in intelligent tutoring systems

arXiv:2505.04736v2 Announce Type: replace Abstract: Intelligent tutoring systems have demonstrated effectiveness in teaching formal propositional logic proofs, but their reliance on template-based explanations limits their ability to provide personalized student feedback. While large language models (LLMs) offer promising capabilities for dynamic feedback generation, they risk producing hallucinations or pedagogically unsound explanations. We evaluated the stepwise […]

SCALEX: Scalable Concept and Latent Exploration for Diffusion Models

arXiv:2511.13750v2 Announce Type: replace-cross Abstract: Image generation models frequently encode social biases, including stereotypes tied to gender, race, and profession. Existing methods for analyzing these biases in diffusion models either focus narrowly on predefined categories or depend on manual interpretation of latent directions. These constraints limit scalability and hinder the discovery of subtle or unanticipated […]

You Only Forward Once: An Efficient Compositional Judging Paradigm

arXiv:2511.16600v2 Announce Type: replace Abstract: Multimodal large language models (MLLMs) show strong potential as judges. However, existing approaches face a fundamental trade-off: adapting MLLMs to output a single score misaligns with the generative nature of MLLMs and limits fine-grained requirement understanding, whereas autoregressively generating judging analyses is prohibitively slow in high-throughput settings. Observing that judgment […]

Quantum Masked Autoencoders for Vision Learning

arXiv:2511.17372v1 Announce Type: cross Abstract: Classical autoencoders are widely used to learn features of input data. To improve the feature learning, classical masked autoencoders extend classical autoencoders to learn the features of the original input sample in the presence of masked-out data. While quantum autoencoders exist, there is no design and implementation of quantum masked […]

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge

arXiv:2503.09114v2 Announce Type: replace-cross Abstract: The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making. While state-of-the-art LMs often boast hundreds of billions of parameters and are primarily deployed in data centers, recent trends show a growing focus on compact models-typically under […]

Genomic Next-Token Predictors are In-Context Learners

arXiv:2511.12797v2 Announce Type: replace-cross Abstract: In-context learning (ICL) — the capacity of a model to infer and apply abstract patterns from examples provided within its input — has been extensively studied in large language models trained for next-token prediction on human text. In fact, prior work often attributes this emergent behavior to distinctive statistical properties […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844