arXiv:2606.00196v2 Announce Type: replace Abstract: Across biological and social systems, cooperation often depends on phenotypic cues rather than random encounters. To account for real-world interactions unfolding across multiple, simultaneous dimensions, here we develop a general framework for the evolution of cooperation in multiplex networks governed by multi-phenotype homophily. We derive analytical conditions for natural selection […]
DyCon: Dynamic Reasoning Control via Evolving Difficulty Modeling
arXiv:2606.07108v2 Announce Type: replace Abstract: Recent advances in Large Reasoning Models (LRMs) demonstrate remarkable performance improvements by iteratively reflecting, exploring, and executing complex tasks, yet suffer from inefficiencies due to redundant reasoning, known as “overthinking”. Existing methods to mitigate this issue either rely on static difficulty estimates or require task-specific training, and thus fail to […]
Reflective Empiricism: Bias Reflection and Introspection as a Scientific Method
arXiv:2504.12310v2 Announce Type: replace-cross Abstract: This paper introduces Reflective Empiricism, an extension of empirical science that incorporates subjective perception and consciousness processes as equally valid sources of knowledge. It views reality as an interplay of subjective experience and objective laws, comprehensible only through systematic introspection, bias reflection, and premise-based logical-explorative modeling. This approach overcomes paradigmatic […]
Understanding Benchmark Language Under Weakened Formal Semantics
arXiv:2509.17455v2 Announce Type: replace-cross Abstract: State-of-the-art NLP benchmarks require interpretation of natural language that specifies conditions, procedures, and exceptions, often relying on implicit assumptions and external knowledge. Constructing complete semantic representations with proof-theoretic guarantees is frequently impractical at scale, and purely text-based reasoning offers limited means of inspection. This paper asks how much understanding of […]
MedVision: Benchmarking Quantitative Medical Image Analysis
arXiv:2511.18676v2 Announce Type: replace-cross Abstract: Current vision-language models (VLMs) in medicine are primarily designed for categorical question answering (e.g., “Is this normal or abnormal?”) or qualitative descriptive tasks. However, clinical decision-making often relies on quantitative assessments, such as measuring the size of a tumor or the angle of a joint, from which physicians draw their […]
A Comparative Study of Student Perspectives on Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics
arXiv:2601.11541v2 Announce Type: replace-cross Abstract: To address the scalability of feedback in computer science while mitigating the privacy and cost limitations of commercial Large Language Models (LLMs), this study evaluates a locally hosted Small Language Model (SLM). We deployed a quantized Llama-3.1, GPT-4, and human instructors across introductory programming (N=176), operating systems (N=80), and a […]
Reward Shaping for (Inference-Time) Alignment: A Stackelberg Game Perspective
arXiv:2602.02572v2 Announce Type: replace-cross Abstract: Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This practice is suboptimal for maximizing user’s utility because the KL regularization may cause the LLM to inherit the bias in the […]
A Mixed Diet Makes DINO An Omnivorous Vision Encoder
arXiv:2602.24181v2 Announce Type: replace-cross Abstract: Pre-trained vision encoders like DINOv2 have demonstrated exceptional performance on unimodal tasks. However, we observe that their features are poorly aligned across different visual modalities. For instance, the feature embedding for an RGB image and its corresponding depth map of the same scene exhibit a cosine similarity that is nearly […]
Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning
arXiv:2604.07848v2 Announce Type: replace-cross Abstract: Multi-task learning shows strikingly inconsistent results — sometimes joint training helps substantially, sometimes it actively harms performance — yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal […]
DynamicPO: Dynamic Preference Optimization for Recommendation
arXiv:2605.00327v2 Announce Type: replace-cross Abstract: In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-feedback negatives and sharpen preference boundaries. However, our empirical analyses reveal a counterintuitive phenomenon, preference optimization collapse, where increasing the number of negative samples can lead […]
Evaluating Design Video Generation: Metrics for Compositional Fidelity
arXiv:2605.16223v2 Announce Type: replace-cross Abstract: Generative video models are increasingly used in design animation tasks, yet no standardized evaluation framework exists for this domain. Unlike natural video generation, design animation imposes structured constraints: specific components shall animate with prescribed motion types, directions, speed and timing, while non-animated regions must remain stable and layout structure must […]
The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection
arXiv:2605.26872v2 Announce Type: replace-cross Abstract: LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers […]