The one piece of data that could actually shed light on your job and AI

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Within Silicon

AI is changing how small online sellers decide what to make

For years Mike McClary sold the Guardian LTE Flashlight, a heavy-duty black model, online through his small outdoor brand. The product, designed for brightness and

ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs

arXiv:2604.02811v1 Announce Type: cross Abstract: Functional verification consumes over 50% of the IC development lifecycle, where SystemVerilog Assertions (SVAs) are indispensable for formal property verification

Speaking of Language: Reflections on Metalanguage Research in NLP

arXiv:2604.02645v1 Announce Type: cross Abstract: This work aims to shine a spotlight on the topic of metalanguage. We first define metalanguage, link it to NLP

Finding Belief Geometries with Sparse Autoencoders

arXiv:2604.02685v1 Announce Type: cross Abstract: Understanding the geometric structure of internal representations is a central goal of mechanistic interpretability. Prior work has shown that transformers

The Overlooked Repetitive Lengthening Form in Sentiment Analysis

April 3, 2026

arXiv:2604.01268v1 Announce Type: cross
Abstract: Individuals engaging in online communication frequently express personal opinions with informal styles (e.g., memes and emojis). While Language Models (LMs) with informal communications have been widely discussed, a unique and emphatic style, the Repetitive Lengthening Form (RLF), has been overlooked for years. In this paper, we explore answers to two research questions: 1) Is RLF important for sentiment analysis (SA)? 2) Can LMs understand RLF? Inspired by previous linguistic research, we curate textbfLengthening, the first multi-domain dataset with 850k samples focused on RLF for SA. Moreover, we introduce textbfExplainable textbfInstruction Tuning (textbfExpInstruct), a two-stage instruction tuning framework aimed to improve both performance and explainability of LLMs for RLF. We further propose a novel unified approach to quantify LMs’ understanding of informal expressions. We show that RLF sentences are expressive expressions and can serve as signatures of document-level sentiment. Additionally, RLF has potential value for online content analysis. Our results show that fine-tuned Pre-trained Language Models (PLMs) can surpass zero-shot GPT-4 in performance but not in explanation for RLF. Finally, we show ExpInstruct can improve the open-sourced LLMs to match zero-shot GPT-4 in performance and explainability for RLF with limited samples. Code and sample data are available at https://github.com/Tom-Owl/OverlookedRLF

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844