arXiv:2601.19002v1 Announce Type: new
Abstract: Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented samples, including cell-free DNA, but introduces stochastic enzymatic artifacts that complicate preprocessing and downstream interpretation. In post-bisulfite library construction, Adaptase-mediated tailing blurs the boundary between biological sequence and synthetic additions, rendering read trimming a persistent source of variability across analytical pipelines. We show that this variability reflects an intrinsic limit of per-read boundary inference rather than an algorithmic shortcoming: boundary localization is fundamentally asymmetric between paired-end reads, with Read 2 exhibiting kinetically structured artifacts that support constrained read-level inference, while apparent contamination in Read 1 arises conditionally from geometry-driven read-through events and is not well-defined at the single-read level. Even within Read 2, bisulfite-induced compositional degeneracy creates an indistinguishable regime in which genomic and synthetic origins share support under the same observable sequence evidence, implying a strictly positive Bayes error under any deterministic per-read decision rule and placing a fundamental limit on per-read boundary fidelity. By explicitly characterizing these limits, we reframe read trimming in ssWGBS as a constrained inference problem and introduce a conservative framework that operates only where supported by observable evidence (including short-range nucleotide texture), exposes interpretable trade-offs between genomic retention and residual artifact risk, and avoids forced resolution where boundaries are intrinsically unresolvable. Together, these results clarify why fixed trimming heuristics persist in practice and provide a principled foundation for uncertainty-aware preprocessing in ssWGBS.
The AI Hype Index: Grok makes porn, and Claude Code nails your job
Everyone is panicking because AI is very bad; everyone is panicking because AI is very good. It’s just that you never know which one you’re

