arXiv:2605.28647v1 Announce Type: new Abstract: It is well known that LLM guardrails and trained persona dynamics can produce a reality gap: the distance between the world a LLM is permitted or shaped to describe, and the world in which users must act. Here we argue that actively generating reality gaps is in fact unethical because […]
The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic
arXiv:2605.28700v1 Announce Type: new Abstract: The GSM-Symbolic benchmark (Mirzadeh et al., 2025) reported consistent performance drops across 25 Large Language Models (LLMs) when tested on template-generated variants of GSM8K problems, concluding that the models lack genuine reasoning capabilities. We argue that this conclusion rests on shaky statistical ground. Re-evaluating 20 open-weight models using Generalised Linear […]
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
arXiv:2605.28742v1 Announce Type: new Abstract: Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive in the best case and intractable in […]
Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities
arXiv:2605.27388v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly utilized as proxies for computational social analysis; yet, their ability to faithfully represent the “thick descriptions” (Geertz, 1973) of human communities remains a critical challenge. Current evaluations often reduce social identity to static labels, sidelining how real-world groups navigate social shifts. To bridge this […]
Short-Term Gain, Long-Term Fragility: AI Labor Substitution and the Erosion of Sustainable Capability
arXiv:2605.27399v1 Announce Type: cross Abstract: What looks like acceleration can be a quiet transfer of burden from the present to the future. Attempts to replace human labor with AI systems are often presented as rational responses to technological progress, but that view is often structurally short-sighted. Across software development and adjacent knowledge industries, AI is […]
Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients
arXiv:2605.27412v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) have emerged with promising energy-efficient property, yet a substantial performance gap persists compared to Artificial Neural Networks (ANNs). This gap stems from at least two key limitations: first, conventional spiking neurons offer limited information representation capacity, underutilizing the rich dynamics of membrane potentials; second, fixed surrogate […]
RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?
arXiv:2605.27436v1 Announce Type: cross Abstract: Multimodal alignment is critical for bridging the semantic gap in information retrieval. However, traditional pairwise strategies introduce a geometric blind spot: while they align anchor modalities (e.g., text) with others, they lack constraints to enforce mutual consistency between peripheral modalities (e.g., video and audio). The TRIANGLE framework addresses this by […]
When prompt perturbations break your A/B test: A valid statistical test for generative surveying
arXiv:2605.27463v1 Announce Type: cross Abstract: Generative surveying — where collections of LLM-based personas provide feedback on messages — has emerged as a cheap and scalable alternative to traditional market research. However, LLMs are sensitive to small variations in prompt design and conclusions drawn from generative surveys may depend on arbitrary phrasing choices. Controlling for this […]
HEAL: Resilient and Self-* Hub-based Learning
arXiv:2605.27475v1 Announce Type: cross Abstract: Decentralized learning enhances privacy, scalability, and fault tolerance by distributing data and computation across nodes. A popular approach is Federated learning, which relies on a central aggregator, yet faces challenges such as server vulnerabilities, scalability issues, privacy risks and most importantly, the single point of failure. Alternatively Gossip Learning and […]
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems
arXiv:2605.27492v1 Announce Type: cross Abstract: LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks that fail to capture the dynamic complexity of real-world production workflows. As a result, benchmark performance may poorly reflect practical capability under realistic runtime […]
Eliot: Interactively $underlineE$xploring Fast-Changing Scientific $underlineLi$terature Trends with $underlineO$nline Da$underlinet$a and Learning
arXiv:2605.27610v1 Announce Type: cross Abstract: The rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-based assistants retrieve or summarize papers, but often hide how the corpus was selected, organized, or connected to temporal patterns. We present $textttEliot$, a publicly deployed interactive system for traceable […]
How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks
arXiv:2605.27662v1 Announce Type: cross Abstract: Equivariant neural networks encode geometric symmetries by construction, yet they are often difficult to optimize and can underperform less constrained architectures. A growing body of work addresses this through architectural modifications such as constraint relaxation or approximate equivariance, while the role of the optimizer remains comparatively underexplored. We study this […]