Training Language Agents to Learn from Experience

arXiv:2605.20477v1 Announce Type: cross Abstract: Language agents can adapt from experience in interactive environments, but current reflection-based methods can only self-correct within a single task

ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning

arXiv:2605.20385v1 Announce Type: cross Abstract: Recent progress in promptable segmentation has shifted visual perception from object-level localization toward concept-level understanding. However, the notion of a

Modeling Emotional Dynamics in Agent-to-Agent Interactions on Moltbook

arXiv:2605.20442v1 Announce Type: cross Abstract: Generative AI systems are increasingly deployed as interactive agents in online environments, such as a social network called Moltbook. In

VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

arXiv:2605.20901v1 Announce Type: cross Abstract: We propose VISTA, a V-JEPA Integrated StillFast Temporal Anticipator for the Ego4D Short-Term Object Interaction Anticipation (STA) Challenge at EgoVis

Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs

arXiv:2605.21027v1 Announce Type: cross Abstract: Enterprise analytics aims to make organizational data accessible for decision-making, yet non-technical users still face barriers when using traditional business

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

May 21, 2026

arXiv:2605.20731v1 Announce Type: cross
Abstract: Text-to-image models produce graphic design at production scale, but their supervision comes from photo-style preference data with a single overall verdict per comparison. Designers evaluate along several distinct axes, including typography, visual hierarchy, color harmony, layout, and brief fidelity, and a single label collapses them. We release TASTE (Typography, Aesthetics, Spatial, Tone, Etc.): ten professional designers ranked outputs from four current text-to-image models on nine criteria across two disjoint cohorts, yielding 1,600 ratings per criterion plus per-image hallucination flags on the holistic-preference cohorts. We pair the dataset with three contributions. First, a criterion-agnostic signal test framework, using Kendall’s tau, majority probability, and Condorcet cycles against exact iid-uniform nulls at p = 4 and R = 5, places designer agreement on graphic design between food and movie preferences and photo-style image quality, with every TASTE criterion rejecting the random-rater null. Second, no pre-trained system in our benchmark, including six open-weight VLM judges from 3B to 33B parameters and three dedicated T2I scorers, HPSv2.1, PickScore-v1, and LAION-Aesthetic-V2, exceeds 0.55 macro agreement with the 5-designer majority; VLM judges trade off position bias against content sensitivity, so scaling moves along this frontier without improving accuracy. Third, a small pairwise-difference head trained on TASTE reaches 0.611, closing roughly half the gap to the 0.741 single-rater ceiling.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844