arXiv:2606.01567v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents increasingly rely on reusable skills i.e. documents describing task-specific procedures. However, this introduces a new attack surface for agents to manage. We study two complementary directions for this threat. First, we evaluate guardian-based defenses: an intermediary LLM agent that acts as a mediator for skill […]
Optimizing Energy-based Neural Network Training with Coherent Ising Machine
arXiv:2606.09117v1 Announce Type: cross Abstract: While Ising machines serve as advanced physical solvers for the Ising model,enabling applications in combinatorial optimization and neural network training,their scalability for large-scale neural networks remains constrained by hardware connectivity limitations and suboptimal training methodologies. In this work,we leverage a Coherent Ising Machine (CIM) to train an energy-based neural network […]
AI Assurance in UK Defence: Challenges in Operationalising JSP 936
arXiv:2606.09414v1 Announce Type: cross Abstract: This report examines practical challenges in operationalising JSP 936 Part 1 for AI assurance in UK Defence. Using a structured interpretive review of the directive’s requirements, the analysis identifies eight thematic challenge areas adequacy of evidence and argument, management of human interaction with AI, definition of the operational environment, integration […]
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning
arXiv:2605.23595v3 Announce Type: replace-cross Abstract: The rapid advancement of machine learning has led to an unprecedented expansion of model ecosystems, making it increasingly difficult to assess the reliability of newly released models on unseen and unlabeled data. Existing evaluation pipelines typically rely on costly annotation, repeated fine-tuning, or assumptions that do not generalize well to […]
Intelligent Character Recognition of Handwritten Forms with Deep Neural Networks
arXiv:2606.08858v1 Announce Type: cross Abstract: The automatic processing of handwritten forms remains a challenging task, wherein detection and subsequent classification of handwritten characters are essential steps. We describe a novel approach, in which both steps — detection and classification — are executed in one task through a deep neural network. Therefore, training data is not […]
Video Understanding by Design: How Datasets Shape Video Models
arXiv:2509.09151v2 Announce Type: replace-cross Abstract: Research in video understanding has advanced rapidly, driven by increasingly diverse datasets and more powerful model architectures. While existing surveys typically organize progress by tasks, benchmarks, or model families, they provide limited insight into why particular architectures emerged and succeeded. In this survey, we argue that the evolution of video […]
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
arXiv:2602.15327v2 Announce Type: replace-cross Abstract: Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing […]
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
arXiv:2605.03862v4 Announce Type: replace Abstract: Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for […]
Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO
arXiv:2606.09701v1 Announce Type: cross Abstract: AI red teaming must continually adapt to evolving attackers and defenders. Reinforcement learning offers a promising approach to discovering novel attacks, and co-training methods can produce more robust defenders in tandem. Recent works have demonstrated the efficacy of attacker-defender co-training by applying PPO and DPO, but report that GRPO is […]
Robust Parametric Estimation of Avian Cranial Morphology
arXiv:2511.06426v3 Announce Type: replace Abstract: Understanding the growth and form of complex morphological structures is one of the most fundamental problems in biology. While many prior works have analyzed the beak morphology of Darwin’s finches, other cranial features are relatively less explored. In this work, we develop geometric and statistical methods for analyzing the skull […]
An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
arXiv:2606.06360v2 Announce Type: replace Abstract: Modelling individual decision-making during infectious disease outbreaks is crucial for understanding behavioural dynamics and informing effective public health interventions. Prior work has shown that large language models can simulate realistic human behaviour by generating agent decisions based on demographic prompts and situational context. We build on this foundation with a […]
How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs
arXiv:2601.06599v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior work, however how they change when context is introduced remains unexplored. We study this question by measuring (1) the […]