arXiv:2601.18747v1 Announce Type: cross
Abstract: Modern information retrieval is transitioning from simple document filtering to complex, neuro-symbolic reasoning workflows. However, current retrieval architectures face a fundamental efficiency dilemma when handling the rigorous logical and arithmetic constraints required by this new paradigm. Standard iterator-based engines (Document-at-a-Time) do not natively support complex, nested logic graphs; forcing them to execute such queries typically results in intractable runtime performance. Conversely, naive recursive approaches (Term-at-a-Time), while capable of supporting these structures, suffer from prohibitive memory consumption when enforcing broad logical exclusions.
In this paper, we propose that a retrieval engine must be capable of “Capturing $mathbfP$” — evaluating any polynomial-time property directly over its index in a computationally efficient manner. We define a formal Retrieval Language ($mathcalL_R$) based on Directed Acyclic Graphs (DAGs) and prove it precisely captures the complexity class $mathbfP$. We introduce textttComputePN, a novel evaluation algorithm that makes $mathcalL_R$ tractable. By combining native DAG traversal with a memory-efficient “Positive-Negative” response mechanism, textttComputePN ensures the efficient evaluation of any query in $mathcalL_R$. This work establishes the theoretical foundation for turning the search index into a general-purpose computational engine.


