arXiv:2603.25326v3 Announce Type: replace Abstract: Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper introduces a framework for evaluating harmful AI manipulation via context-specific human-AI interaction studies. We illustrate the utility of this framework by assessing an AI model with 10,101 participants spanning interactions […]
FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment
arXiv:2506.03198v4 Announce Type: replace-cross Abstract: Action Quality Assessment (AQA) — the task of quantifying how well an action is performed — has great potential for detecting errors in gym weight training, where accurate feedback is critical to prevent injuries and maximize gains. Existing AQA datasets, however, are limited to single-view competitive sports and RGB video, […]
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
arXiv:2510.10510v2 Announce Type: replace-cross Abstract: Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation […]
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
arXiv:2602.16746v3 Announce Type: replace-cross Abstract: Grokking — the delayed transition from memorization to generalization in small algorithmic tasks — remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight trajectories reveals that training evolves predominantly within a low-dimensional execution subspace, with a single principal […]
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
arXiv:2603.28013v2 Announce Type: replace-cross Abstract: We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success rate (ASR); we localize the pipeline stage at which each model’s defense activates. We instrument every run with a cryptographic canary token (SECRET-[A-F0-9]8) tracked through four kill-chain stages — Exposed, […]
Corporations Constitute Intelligence
arXiv:2604.02912v1 Announce Type: cross Abstract: In January 2026, Anthropic published a 79-page “constitution” for its AI model Claude, the most comprehensive corporate AI governance document ever released. This Article offers the first legal and democratic-theoretic analysis of that document. Despite genuine philosophical sophistication, the constitution harbors two structural defects. First, it excludes the contexts where […]
Self-Optimizing Multi-Agent Systems for Deep Research
arXiv:2604.02988v1 Announce Type: cross Abstract: Given a user’s complex information need, a multi-agent Deep Research system iteratively plans, retrieves, and synthesizes evidence across hundreds of documents to produce a high-quality answer. In one possible architecture, an orchestrator agent coordinates the process, while parallel worker agents execute tasks. Current Deep Research systems, however, often rely on […]
JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency
arXiv:2604.03044v1 Announce Type: cross Abstract: We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), […]
AlertStar: Path-Aware Alert Prediction on Hyper-Relational Knowledge Graphs
arXiv:2604.03104v1 Announce Type: cross Abstract: Cyber-attacks continue to grow in scale and sophistication, yet existing network intrusion detection approaches lack the semantic depth required for path reasoning over attacker-victim interactions. We address this by first modelling network alerts as a knowledge graph, then formulating hyper-relational alert prediction as a hyper-relational knowledge graph completion (HR-KGC) problem, […]
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
arXiv:2604.03174v1 Announce Type: cross Abstract: Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context windows, and weakly structured causal reasoning. This survey provides a unified account of augmentation strategies along a single axis: the degree of structured context supplied at inference time. We […]
Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution
arXiv:2509.12643v4 Announce Type: replace Abstract: Large Language Model (LLM)-based optimization has recently shown promise for autonomous problem solving, yet most approaches still cast LLMs as passive constraint checkers rather than proactive strategy designers, limiting their effectiveness on complex Constraint Optimization Problems (COPs). To address this, we present AutoCO, an end-to-end Automated Constraint Optimization method that […]
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
arXiv:2601.23048v3 Announce Type: replace Abstract: Large language models now solve many benchmark math problems at near-expert levels, yet this progress has not fully translated into reliable performance in real-world applications. We study this gap through contextual mathematical reasoning, where the mathematical core must be formulated from descriptive scenarios. We introduce ContextMATH, a benchmark that repurposes […]