Path Signatures Enable Model-Free Mapping of RNA Modifications
Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2’-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.
💡 Research Summary
The paper introduces a model‑free computational framework for detecting RNA chemical modifications directly from nanopore direct‑RNA sequencing data. Traditional approaches rely on supervised machine‑learning models that must be trained for each specific modification type using large, annotated datasets, and they often need retraining when sequencing chemistries change. In contrast, the authors recast modification detection as an anomaly‑detection problem that requires only a reference set of unmodified (in‑vitro‑transcribed, IVT) reads.
The core of the method is the signature transform, a mathematical tool that converts the raw ionic‑current time series for each nucleotide (or k‑mer) into a fixed‑length feature vector capturing rich temporal dynamics, higher‑order statistics, and shape information. For each experimental read, the pipeline extracts these vectors at every aligned position, then computes a distance to the IVT reference set using nearest‑neighbor Mahalanobis distance. Larger distances indicate that the observed signal deviates from the canonical distribution, suggesting the presence of a modification.
Because raw distance scores are not directly comparable across sites, the authors calibrate them against an independent IVT calibration set to obtain conformal p‑values. These p‑values are statistically valid, allowing rigorous false‑discovery‑rate (FDR) control via Benjamini‑Hochberg or Storey’s π0 estimation. Individual read‑site p‑values can be visualized as heatmaps, while site‑level significance is obtained by aggregating read‑level p‑values using Fisher’s combination test. The final output is a BED file containing per‑site anomaly rates, p‑values, and coverage statistics, ready for downstream analysis or genome‑browser visualization.
Performance is evaluated on three biologically distinct datasets:
-
E. coli ribosomal RNA – 36 well‑characterized modification sites across 16S and 23S rRNA serve as a benchmark. The method produces clear peaks in anomaly scores at known sites, with area‑under‑ROC values approaching 1 for modified positions and near 0 for unmodified positions. Two‑sample Kolmogorov‑Smirnov tests confirm significant distributional shifts between IVT and native scores.
-
Dengue virus subgenomic flaviviral RNA (sfRNA) – The framework discovers a previously unreported 2′‑O‑methylation site. Orthogonal validation by quantitative RT‑PCR confirms the modification, demonstrating the ability to detect novel chemistries without prior training.
-
Mouse mRNA (METTL3 knock‑out vs wild‑type) – By comparing anomaly rates, the method recovers known m6A loss in METTL3‑deficient cells, illustrating its utility for differential modification analysis in eukaryotic transcripts.
Importantly, the approach works across different nanopore chemistries (RNA002, RNA004) and RNA species (bacterial rRNA, viral RNA, mammalian mRNA) without any retraining, highlighting its adaptability. While the method does not assign a specific modification type, the calibrated anomaly scores enable rapid prioritization of candidate sites for downstream classification or experimental validation, dramatically reducing computational and experimental burden.
In summary, the authors present a statistically rigorous, chemistry‑agnostic pipeline that leverages signature‑based feature extraction and distance‑based anomaly detection to map RNA modifications genome‑wide at single‑molecule resolution. This model‑free strategy overcomes the limitations of existing supervised tools, offers immediate applicability to new sequencing kits, and opens the door to systematic discovery of unknown RNA modifications.
Comments & Academic Discussion
Loading comments...
Leave a Comment