arXiv:2509.23629v2 Announce Type: replace Abstract: Training large language models with Reinforcement Learning with Verifiable Rewards (RLVR) exhibits a set of distinctive and puzzling behaviors that remain poorly understood, including a two-stage learning curve, a V-shaped response-length trajectory, and a pronounced vulnerability to catastrophic forgetting. In this work, we propose that these behaviors are emergent collective […]
A Reinforcement Learning-Based Telematic Routing Protocol for the Internet of Underwater Things
arXiv:2506.00133v2 Announce Type: replace-cross Abstract: The Internet of Underwater Things (IoUT) has a lot of problems, like low bandwidth, high latency, mobility, and not enough energy. Routing protocols that were made for land-based networks, like RPL, don’t work well in these underwater settings. This paper talks about RL-RPL-UA, a new routing protocol that uses reinforcement […]
Model Inversion Attack Against Deep Hashing
arXiv:2511.12233v2 Announce Type: replace-cross Abstract: Deep hashing improves retrieval efficiency through compact binary codes, yet it introduces severe and often overlooked privacy risks. The ability to reconstruct original training data from hash codes could lead to serious threats such as biometric forgery and privacy breaches. However, model inversion attacks specifically targeting deep hashing models remain […]
MonoKAN: Certified Monotonic Kolmogorov-Arnold Network
arXiv:2409.11078v2 Announce Type: replace-cross Abstract: Artificial Neural Networks (ANNs) have significantly advanced various fields by effectively recognizing patterns and solving complex problems. Despite these advancements, their interpretability remains a critical challenge, especially in applications where transparency and accountability are essential. To address this, explainable AI (XAI) has made progress in demystifying ANNs, yet interpretability alone […]
SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation
arXiv:2511.17432v1 Announce Type: cross Abstract: Traditional evaluation metrics for textual and visual question answering, like ROUGE, METEOR, and Exact Match (EM), focus heavily on n-gram based lexical similarity, often missing the deeper semantic understanding needed for accurate assessment. While measures like BERTScore and MoverScore leverage contextual embeddings to address this limitation, they lack flexibility in […]
Meta-World+: An Improved, Standardized, RL Benchmark
arXiv:2505.11289v2 Announce Type: replace Abstract: Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the […]
FaCells. Teaching Machines the Language of Lines: Per Point Attribute Scores for Face-Sketch Classification
arXiv:2102.11361v3 Announce Type: replace-cross Abstract: FaCells is a method, and an exhibition, that turns model internals into line based artworks. Aligned face photographs (CelebA, 260k images, 40 attributes) are translated into vector sketches suitable for an XY plotter. We study how to ‘write’ these drawings for a sequence model, comparing absolute vs. relative point encodings […]
MiniLLM: Knowledge Distillation of Large Language Models
arXiv:2306.08543v5 Announce Type: replace-cross Abstract: Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into […]
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
arXiv:2503.17987v3 Announce Type: replace-cross Abstract: Text-to-Image(T2I) models typically deploy safety filters to prevent the generation of sensitive images. Unfortunately, recent jailbreaking attack methods manually design instructions for the LLM to generate adversarial prompts, which effectively bypass safety filters while producing sensitive images, exposing safety vulnerabilities of T2I models. However, due to the LLM’s limited understanding […]
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
arXiv:2508.06869v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks, yet their processing of long videos is constrained by input context length and high computational costs. Sparse frame sampling thus becomes a necessary preprocessing step, with sampled frame quality directly impacting downstream performance. Existing keyframe search algorithms achieve a […]
Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
arXiv:2510.27629v4 Announce Type: replace-cross Abstract: Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to develop more deadly bioweapons. To mitigate the risk posed by these models, current approaches focus on filtering biohazardous data during pre-training. However, the effectiveness of […]
Planning with Sketch-Guided Verification for Physics-Aware Video Generation
arXiv:2511.17450v1 Announce Type: cross Abstract: Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these methods mostly employ single-shot plans that are typically limited to simple motions, or iterative refinement which requires multiple calls to the video generator, incuring high computational […]