arXiv:2602.00465v2 Announce Type: replace-cross
Abstract: Functional miRNA–mRNA targeting is a large-bag prediction problem: each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. We formalize this regime as emphBudgeted Relational Multi-Instance Learning (BR-MIL), where at most $K$ instances per bag may receive expensive encoding and relational processing under a hard compute budget. We propose textbfPAIR-Former (Pool-Aware Instance-Relational Transformer), a BR-MIL pipeline that performs a cheap full-pool scan, selects up to $K$ diverse CTSs on CPU, and applies a permutation-invariant Set Transformer aggregator on the selected tokens. On miRAW, PAIR-Former outperforms strong pooling baselines at a practical operating budget ($K^star=64$) while providing a controllable accuracy–compute trade-off as $K$ varies. We further provide theory linking budgeted selection to (i) approximation error decreasing with $K$ and (ii) generalization terms governed by $K$ in the expensive relational component.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844