ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs

arXiv:2604.02811v1 Announce Type: cross Abstract: Functional verification consumes over 50% of the IC development lifecycle, where SystemVerilog Assertions (SVAs) are indispensable for formal property verification

Speaking of Language: Reflections on Metalanguage Research in NLP

arXiv:2604.02645v1 Announce Type: cross Abstract: This work aims to shine a spotlight on the topic of metalanguage. We first define metalanguage, link it to NLP

Opal: Private Memory for Personal AI

arXiv:2604.02522v1 Announce Type: cross Abstract: Personal AI systems increasingly retain long-term memory of user activity, including documents, emails, messages, meetings, and ambient recordings. Trusted hardware

Moondream Segmentation: From Words to Masks

arXiv:2604.02593v1 Announce Type: cross Abstract: We present Moondream Segmentation, a referring image segmentation extension of Moondream 3, a vision-language model. Given an image and a

I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime

arXiv:2604.02500v1 Announce Type: new Abstract: As ongoing research explores the ability of AI agents to be insider threats and act against company interests, we showcase

Finding Belief Geometries with Sparse Autoencoders

April 6, 2026

arXiv:2604.02685v1 Announce Type: cross
Abstract: Understanding the geometric structure of internal representations is a central goal of mechanistic interpretability. Prior work has shown that transformers trained on sequences generated by hidden Markov models encode probabilistic belief states as simplex-shaped geometries in their residual stream, with vertices corresponding to latent generative states. Whether large language models trained on naturalistic text develop analogous geometric representations remains an open question.
We introduce a pipeline for discovering candidate simplex-structured subspaces in transformer representations, combining sparse autoencoders (SAEs), $k$-subspace clustering of SAE features, and simplex fitting using AANet. We validate the pipeline on a transformer trained on a multipartite hidden Markov model with known belief-state geometry. Applied to Gemma-2-9B, we identify 13 priority clusters exhibiting candidate simplex geometry ($K geq 3$).
A key challenge is distinguishing genuine belief-state encoding from tiling artifacts: latents can span a simplex-shaped subspace without the mixture coordinates carrying predictive signal beyond any individual feature. We therefore adopt barycentric prediction as our primary discriminating test. Among the 13 priority clusters, 3 exhibit a highly significant advantage on near-vertex samples (Wilcoxon $p < 10^-14$) and 4 on simplex-interior samples. Together 5 distinct real clusters pass at least one split, while no null cluster passes either. One cluster, 768_596, additionally achieves the highest causal steering score in the dataset. This is the only case where passive prediction and active intervention converge. We present these findings as preliminary evidence that genuine belief-like geometry exists in Gemma-2-9B’s representation space, and identify the structured evaluation that would be required to confirm this interpretation.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844