arXiv:2604.19354v2 Announce Type: replace Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag (CTF) challenges in isolated virtualized environments. DeepRed places an agent in a Kali […]
Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
arXiv:2605.03348v2 Announce Type: replace-cross Abstract: We propose S3 (Specialization, Selection, Sparsification), a framework that rethinks multimodal learning through a structural perspective. Instead of encoding all signals into a fixed embedding, S3 decomposes multimodal inputs into semantic experts and selectively routes them for each task. Specialization forms concept-level experts in a shared latent space, Selection adapts […]
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
arXiv:2604.09554v2 Announce Type: replace Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models on scientific data to agentic autonomous hypothesis generation systems to AI-driven autonomous labs. The need to measure progress of AI systems in scientific domains correspondingly must not […]
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
arXiv:2605.04960v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We […]
Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport
arXiv:2605.04965v1 Announce Type: cross Abstract: Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes […]
Catastrophe-dispersion models in random and varying environments across generations
arXiv:2605.04048v2 Announce Type: replace-cross Abstract: We study a class of branching processes in which the offspring distribution is not specified directly but is induced by a cycle of internal colony growth, catastrophic reduction and structured dispersal. The parameters governing growth, survival and dispersal are allowed to vary deterministically or randomly from one generation to the […]
LCM: Lossless Context Management
arXiv:2605.04050v1 Announce Type: new Abstract: We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens. […]
PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation
arXiv:2605.05159v1 Announce Type: cross Abstract: We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data […]
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring
arXiv:2605.04943v1 Announce Type: cross Abstract: The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a […]
Human-computer interactions predict mental health
arXiv:2511.20179v5 Announce Type: replace Abstract: Scalable assessments of mental illness remain a critical roadblock toward accessible and equitable care. Here, we show that everyday human-computer interactions encode mental health with biomarker accuracy. We introduce MAILA, a MAchine-learning framework for Inferring Latent mental states from digital Activity. We trained MAILA on 18,200 cursor and touchscreen recordings […]
RLDX-1 Technical Report
arXiv:2605.03269v2 Announce Type: replace-cross Abstract: While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, long-term memory, and physical sensing). To address […]
ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms
arXiv:2605.03212v2 Announce Type: replace Abstract: Modeling latent clinical constructs from unconstrained clinical interactions is a unique challenge in affective computing. We present ADAPTS (Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms), a framework for automated rating of depression and anxiety severity using a mixture-of-agents LLM architecture. This approach decomposes long-form clinical interviews into symptom-specific reasoning […]