LLM-DSE: Searching Accelerator Parameters with LLM Agents

arXiv:2505.12188v3 Announce Type: replace-cross Abstract: Even though high-level synthesis (HLS) tools mitigate the challenges of programming domain-specific accelerators (DSAs) by raising the abstraction level, optimizing hardware directive parameters remains a significant hurdle. Existing heuristic and learning-based methods struggle with adaptability and sample efficiency. We present LLM-DSE, a multi-agent framework designed specifically for optimizing HLS directives. […]

Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition

arXiv:2511.17477v1 Announce Type: cross Abstract: Recent advances in multimodal deep learning have greatly enhanced the capability of systems for speech analysis and pronunciation assessment. Accurate pronunciation detection remains a key challenge in Arabic, particularly in the context of Quranic recitation, where subtle phonetic differences can alter meaning. Addressing this challenge, the present study proposes a […]

Can AI Perceive Physical Danger and Intervene?

arXiv:2509.21651v2 Announce Type: replace Abstract: When AI interacts with the physical world — as a robot or an assistive agent — new safety challenges emerge beyond those of purely “digital AI”. In such interactions, the potential for physical harm is direct and immediate. How well do state-of-the-art foundation models understand common-sense facts about physical safety, […]

Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?

arXiv:2511.17400v1 Announce Type: cross Abstract: Vision Transformers ($textViTs$) have become the backbone of vision foundation models, yet their optimization for multi-channel domains – such as cell painting or satellite imagery – remains underexplored. A key challenge in these domains is capturing interactions between channels, as each channel carries different information. While existing works have shown […]

VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference

arXiv:2511.16449v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have shown great promise for embodied AI, yet the heavy computational cost of processing continuous visual streams severely limits their real-time deployment. Token pruning (keeping salient visual tokens and dropping redundant ones) has emerged as an effective approach for accelerating Vision-Language Models (VLMs), offering a solution for […]

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

arXiv:2511.16544v2 Announce Type: replace-cross Abstract: As Automatic Speech Recognition (ASR) is increasingly deployed in clinical dialogue, standard evaluations still rely heavily on Word Error Rate (WER). This paper challenges that standard, investigating whether WER or other common metrics correlate with the clinical impact of transcription errors. We establish a gold-standard benchmark by having expert clinicians […]

Labeled histories and maximally probable labeled topologies with multifurcation

arXiv:2511.16799v1 Announce Type: new Abstract: In mathematical phylogenetics, labeled histories describe the sequences by which sets of labeled lineages coalesce to a shared ancestral lineage. We study labeled histories for at-most-$r$-furcating trees. Consider a rooted leaf-labeled tree in which internal nodes each have $i$ offspring, and $i$ is permitted to range from 2 to $r$ […]

MF-GCN: A Multi-Frequency Graph Convolutional Network for Tri-Modal Depression Detection Using Eye-Tracking, Facial, and Acoustic Features

arXiv:2511.15675v2 Announce Type: replace-cross Abstract: Depression is a prevalent global mental health disorder, characterised by persistent low mood and anhedonia. However, it remains underdiagnosed because current diagnostic methods depend heavily on subjective clinical assessments. To enable objective detection, we introduce a gold standard dataset of 103 clinically assessed participants collected through a tripartite data approach […]

Designing and Generating Diverse, Equitable Face Image Datasets for Face Verification Tasks

arXiv:2511.17393v1 Announce Type: cross Abstract: Face verification is a significant component of identity authentication in various applications including online banking and secure access to personal devices. The majority of the existing face image datasets often suffer from notable biases related to race, gender, and other demographic characteristics, limiting the effectiveness and fairness of face verification […]

The promise and limits of LLMs in constructing proofs and hints for logic problems in intelligent tutoring systems

arXiv:2505.04736v2 Announce Type: replace Abstract: Intelligent tutoring systems have demonstrated effectiveness in teaching formal propositional logic proofs, but their reliance on template-based explanations limits their ability to provide personalized student feedback. While large language models (LLMs) offer promising capabilities for dynamic feedback generation, they risk producing hallucinations or pedagogically unsound explanations. We evaluated the stepwise […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844