• Home
  • Uncategorized
  • Semantic Labeling for Third-Party Cybersecurity Risk Assessment: A Semi-Supervised Approach to Intent-Aware Question Retrieval

arXiv:2602.10149v3 Announce Type: replace-cross
Abstract: Third-Party Risk Assessment (TPRA) relies on large repositories of cybersecurity compliance questions used to assess external suppliers against standards such as ISO/IEC 27001 and NIST. In practice, not all questions are relevant for a specific supplier and selecting questions for a given assessment context remains a manual and time-consuming task. Existing question retrieval approaches based on lexical or semantic similarity can identify topically related questions, but they often fail to capture the underlying assessment intent, including control domain and evaluation scope.
To address this limitation, we investigate whether an explicit semantic label space can improve intent-aware TPRA question selection. In particular, we separate label space discovery from large-scale label assignment. We start by discovering overlapping clusters of semantically similar questions and then exploit LLMs to assign unique labels for each cluster. Second, we propagate labels through k-nearest neighbors (kNN) for a larger-scale question annotation. Question retrieval is finally achieved by similarity measure of the query with respect to the extracted labels instead of the questions themselves. This reduces repeated LLM calls while preserving label consistency.
Experimental results show that the proposed semi-supervised framework reduces labeling cost and runtime compared with per-question LLM annotation while maintaining label quality and improving efficiency. Furthermore, label-based retrieval achieves better alignment with cybersecurity control domains and assessment scope than similarity-based retrieval, highlighting the value of semantic labels as an intermediate representation.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844