arXiv:2603.27154v1 Announce Type: cross
Abstract: Entity resolution — identifying database records that refer to the same real-world entity — is naturally modelled on bipartite graphs connecting entity nodes to their attribute values. Applying a message-passing neural network (MPNN) with all available extensions (reverse message passing, port numbering, ego IDs) incurs unnecessary overhead, since different entity resolution tasks have fundamentally different complexity. For a given matching criterion, what is the cheapest MPNN architecture that provably works?
We answer this with a four-theorem separation theory on typed entity-attribute graphs. We introduce co-reference predicates $mathrmDup_r$ (two same-type entities share at least $r$ attribute values) and the $ell$-cycle predicate $mathrmCyc_ell$ for settings with entity-entity edges. For each predicate we prove tight bounds — constructing graph pairs provably indistinguishable by every MPNN lacking the required adaptation, and exhibiting explicit minimal-depth MPNNs that compute the predicate on all inputs.
The central finding is a sharp complexity gap between detecting any shared attribute and detecting multiple shared attributes. The former is purely local, requiring only reverse message passing in two layers. The latter demands cross-attribute identity correlation — verifying that the same entity appears at several attributes of the target — a fundamentally non-local requirement needing ego IDs and four layers, even on acyclic bipartite graphs. A similar necessity holds for cycle detection. Together, these results yield a minimal-architecture principle: practitioners can select the cheapest sufficient adaptation set, with a guarantee that no simpler architecture works. Computational validation confirms every prediction.
The one piece of data that could actually shed light on your job and AI
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Within Silicon


