arXiv:2601.08950v2 Announce Type: replace Abstract: Despite their growing adoption in education, LLMs remain misaligned with the core principle of effective tutoring: the dialogic construction of knowledge. We introduce ConvoLearn, a dataset of 2,134 semi-synthetic tutor-student dialogues operationalizing six dimensions of dialogic tutoring grounded in knowledge-building theory, situated in middle school Earth Science curriculum. We show […]
Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction
arXiv:2603.13777v2 Announce Type: replace-cross Abstract: Aspect-based sentiment analysis (ABSA) extracts aspect-level sentiment signals from user-generated text, supports product analytics, experience monitoring, and public-opinion tracking, and is central to fine-grained opinion mining. A key challenge in ABSA is aspect sentiment quad prediction (ASQP), which requires identifying four elements: the aspect term, the aspect category, the opinion […]
Spectral Geometry and Heat Kernels on Phylogenetic Trees
arXiv:2603.20922v2 Announce Type: replace Abstract: We develop a unified spectral framework for finite ultrametric phylogenetic trees, grounding the analysis of phylogenetic structure in operator theory and stochastic dynamics in the finite setting. For a given finite ultrametric measure space $(X,d,m)$, we introduce the ultrametric Laplacian $L_X$ as the generator of a continuous time Markov chain […]
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
arXiv:2511.06448v2 Announce Type: replace-cross Abstract: In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powered by large language model (LLM) agents. We investigate whether agents can collaborate in fraudulent behaviors, how such collaboration amplifies risks, and what factors influence fraud success. To support this research, we present MultiAgentFraudBench, a […]
Human-AI Collaborative Game Testing with Vision Language Models
arXiv:2501.11782v2 Announce Type: replace-cross Abstract: As modern video games become increasingly complex, traditional manual testing methods are proving costly and inefficient, limiting the ability to ensure high-quality game experiences. While advancements in Artificial Intelligence (AI) offer the potential to assist human testers, the effectiveness of AI in truly enhancing real-world human performance remains underexplored. This […]
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
arXiv:2505.24535v3 Announce Type: replace-cross Abstract: Controlling multiple behavioral attributes in large language models (LLMs) at inference time is a challenging problem due to interference between attributes and the limitations of linear steering methods, which assume additive behavior in activation space and require per-attribute tuning. We introduce K-Steering, a unified and flexible approach that trains a […]
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
arXiv:2603.18104v2 Announce Type: replace Abstract: Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type […]
ACT: Agentic Classification Tree
arXiv:2509.26433v4 Announce Type: replace-cross Abstract: When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such […]
SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
arXiv:2505.21605v3 Announce Type: replace-cross Abstract: Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., “tell me how to build a bomb”) or […]
Controllable protein design with particle-based Feynman-Kac steering
arXiv:2511.09216v2 Announce Type: replace-cross Abstract: Proteins underpin most biological function, and the ability to design them with tailored structures and properties is central to advances in biotechnology. Diffusion-based generative models have emerged as powerful tools for protein design, but steering them toward proteins with specified properties remains challenging. The Feynman-Kac (FK) framework provides a principled […]
Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs
arXiv:2604.04732v1 Announce Type: cross Abstract: Large language models (LLMs) are often described as multilingual because they can understand and respond in many languages. However, speaking a language is not the same as reasoning within a culture. This distinction motivates a critical question: do LLMs truly conduct culture-aware reasoning? This paper presents a preliminary computational audit […]
Bayesian Hierarchical Invariant Prediction
arXiv:2505.11211v3 Announce Type: replace-cross Abstract: We propose Bayesian Hierarchical Invariant Prediction (BHIP) reframing Invariant Causal Prediction (ICP) through the lens of Hierarchical Bayes. We leverage the hierarchical structure to explicitly test invariance of causal mechanisms under heterogeneous data, resulting in improved computational scalability for a larger number of predictors compared to ICP. Moreover, given its […]