FrontierScience: Evaluating AI’s Ability to Perform Expert-Level Scientific Tasks

arXiv:2601.21165v1 Announce Type: new Abstract: We introduce FrontierScience, a benchmark evaluating expert-level scientific reasoning in frontier language models. Recent model progress has nearly saturated existing science benchmarks, which often rely on multiple-choice knowledge questions or already published information. FrontierScience addresses this gap through two complementary tracks: (1) Olympiad, consisting of international olympiad problems at the […]

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

arXiv:2601.21051v1 Announce Type: new Abstract: We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, […]

The quenched structured coalescent for diploid population models on finite graphs with large migrations and uneven offspring distributions

arXiv:2601.21079v1 Announce Type: cross Abstract: In this work we describe a new model for the evolution of a diploid structured population backwards in time that allows for large migrations and uneven offspring distributions. The model generalizes both the mean-field model of Birkner et al. [textitElectron. J. Probab. 23: 1-44 (2018)] and the haploid structured model […]

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

arXiv:2601.21181v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) suffer from cross-modal hallucinations, where one modality inappropriately influences generation about another, leading to fabricated output. This exposes a more fundamental deficiency in modality-interaction control. To address this, we propose Modality-Adaptive Decoding (MAD), a training-free method that adaptively weights modality-specific decoding branches based on task […]

Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

arXiv:2601.21003v1 Announce Type: new Abstract: Large Language Models usually put more emphasis on accuracy and therefore, will guess even when not certain about the prediction, which is especially severe when fine-tuned on small datasets due to the inherent tendency toward miscalibration. In this work, we introduce Bayesian-LoRA, which reformulates the deterministic LoRA update as a […]

From Linear Input to Hierarchical Structure: Function Words as Statistical Cues for Language Learning

arXiv:2601.21191v1 Announce Type: cross Abstract: What statistical conditions support learning hierarchical structure from linear input? In this paper, we address this question by focusing on the statistical distribution of function words. Function words have long been argued to play a crucial role in language acquisition due to their distinctive distributional properties, including high frequency, reliable […]

Can Large Language Models Capture Video Game Engagement?

arXiv:2502.04379v2 Announce Type: replace-cross Abstract: Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs for successfully predicting continuous affect annotations of videos when prompted by a sequence of text and video frames in […]

More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests

arXiv:2601.21276v1 Announce Type: cross Abstract: Large Language Model (LLM) Agents are advancing quickly, with the increasing leveraging of LLM Agents to assist in development tasks such as code generation. While LLM Agents accelerate code generation, studies indicate they may introduce adverse effects on development. However, existing metrics solely measure pass rates, failing to reflect impacts […]

The Epistemic Planning Domain Definition Language: Official Guideline

arXiv:2601.20969v1 Announce Type: new Abstract: Epistemic planning extends (multi-agent) automated planning by making agents’ knowledge and beliefs first-class aspects of the planning formalism. One of the most well-known frameworks for epistemic planning is Dynamic Epistemic Logic (DEL), which offers an rich and natural semantics for modelling problems in this setting. The high expressive power provided […]

Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models

arXiv:2601.21183v1 Announce Type: new Abstract: Reasoning models frequently agree with incorrect user suggestions — a behavior known as sycophancy. However, it is unclear where in the reasoning trace this agreement originates and how strong the commitment is. To localize and quantify this behavior, we introduce emphsycophantic anchors — sentences that causally lock models into user […]

The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation

arXiv:2601.21360v1 Announce Type: cross Abstract: The rapid integration of Large Language Models (LLMs) into educational assessment rests on the unverified assumption that instruction following capability translates directly to objective adjudication. We demonstrate that this assumption is fundamentally flawed. Instead of evaluating code quality, models frequently decouple from the submission’s logic to satisfy hidden directives, a […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844