SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation

arXiv:2511.17432v1 Announce Type: cross Abstract: Traditional evaluation metrics for textual and visual question answering, like ROUGE, METEOR, and Exact Match (EM), focus heavily on n-gram based lexical similarity, often missing the deeper semantic understanding needed for accurate assessment. While measures like BERTScore and MoverScore leverage contextual embeddings to address this limitation, they lack flexibility in […]

Meta-World+: An Improved, Standardized, RL Benchmark

arXiv:2505.11289v2 Announce Type: replace Abstract: Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the […]

FaCells. Teaching Machines the Language of Lines: Per Point Attribute Scores for Face-Sketch Classification

arXiv:2102.11361v3 Announce Type: replace-cross Abstract: FaCells is a method, and an exhibition, that turns model internals into line based artworks. Aligned face photographs (CelebA, 260k images, 40 attributes) are translated into vector sketches suitable for an XY plotter. We study how to ‘write’ these drawings for a sequence model, comparing absolute vs. relative point encodings […]

MiniLLM: Knowledge Distillation of Large Language Models

arXiv:2306.08543v5 Announce Type: replace-cross Abstract: Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into […]

Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning

arXiv:2503.17987v3 Announce Type: replace-cross Abstract: Text-to-Image(T2I) models typically deploy safety filters to prevent the generation of sensitive images. Unfortunately, recent jailbreaking attack methods manually design instructions for the LLM to generate adversarial prompts, which effectively bypass safety filters while producing sensitive images, exposing safety vulnerabilities of T2I models. However, due to the LLM’s limited understanding […]

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

arXiv:2508.06869v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks, yet their processing of long videos is constrained by input context length and high computational costs. Sparse frame sampling thus becomes a necessary preprocessing step, with sampled frame quality directly impacting downstream performance. Existing keyframe search algorithms achieve a […]

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

arXiv:2510.27629v4 Announce Type: replace-cross Abstract: Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to develop more deadly bioweapons. To mitigate the risk posed by these models, current approaches focus on filtering biohazardous data during pre-training. However, the effectiveness of […]

Fast LLM Post-training via Decoupled and Best-of-N Speculation

arXiv:2511.16193v2 Announce Type: replace-cross Abstract: Rollout dominates the training time in large language model (LLM) post-training, where the trained model is used to generate tokens given a batch of prompts. SpecActor achieves fast rollout with speculative decoding that deploys a fast path (e.g., a smaller model) to accelerate the unparallelizable generation, while the correctness is […]

Planning with Sketch-Guided Verification for Physics-Aware Video Generation

arXiv:2511.17450v1 Announce Type: cross Abstract: Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these methods mostly employ single-shot plans that are typically limited to simple motions, or iterative refinement which requires multiple calls to the video generator, incuring high computational […]

Artificial Intelligence Index Report 2025

arXiv:2504.07139v3 Announce Type: replace Abstract: Welcome to the eighth edition of the AI Index report. The 2025 Index is our most comprehensive to date and arrives at an important moment, as AI’s influence across society, the economy, and global governance continues to intensify. New in this year’s report are in-depth analyses of the evolving landscape […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844