Quantify footprint coherence in windows by estimating the Signal to Noise Ratio (SNR)
Source:R/scanGenome.R
estimateSNRwindows.Rd
Calculate a measure of footprint coherence and window "structureness" by
decomposing total aggregate signal variance into (i) noise
and (ii)
signal
variance components and returning
\(\log_2(\text{signal}/\text{noise})\) as the SNR.
Usage
estimateSNRwindows(
se,
gr,
assayNameAgg = "FracMod",
k = 2L,
min_diffs = NULL,
noise_pars = NULL
)
Arguments
- se
A per-position
RangedSummarizedExperiment
, typically the output ofreadModBam(..., level = "summary")
. Must contain the assay given byassayNameAgg
and an assay"Nvalid"
with coverage.- gr
GRanges
specifying the windows.- assayNameAgg
Character scalar, assay to aggregate (default
"FracMod"
).- k
Integer (default 2L). Two measurements are considered “adjacent” if they are at most
k
bases apart when estimating the noise.- min_diffs
Integer or
NULL
. Minimal number of paired differences to compute a reliable noise estimate. IfNULL
the value is set tomax(16, floor(0.05 * n))
, wheren
is the number of valid positions in the window.- noise_pars
Numeric vector as returned by the
coefficients
slot of theestimateNoiseParsWindows()
result. IfNULL
the function looks formetadata(se)$NoisePars
. If stillNULL
raw noise estimates are used.
Value
A SummarizedExperiment
with assays
"TotalVar"
, "SignalVar"
, "NoiseVar"
and
"SNR"
(\(\log_2\) signal/noise).
Details
Noise variance estimation can be performed in two ways:
Parametric estimation: If noise parameters (
noise_pars
) are supplied (typically estimated usingestimateNoiseParsWindows()
, which fits a linear regression model to relate noise variance to average methylation level and coverage across random genomic windows), the noise variance returned is the expected noise given the window's average methylation and coverage.Non-parametric (raw) estimation: If noise parameters are not supplied, the function estimates noise directly from the data, based on short-range variation between neighboring positions. While unbiased for random signals, this method can overestimate noise—and thus underestimate the true SNR—if the biological signal is strongly correlated across reads (e.g., due to systematic accessibility patterns), because the structured variation is incorrectly interpreted as random noise.
See also
estimateNoiseParsWindows
for estimation of the
global noise model parameters.
Examples
library(GenomicRanges)
modbamfiles <- system.file("extdata",
c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
package = "footprintR")
gr0 <- GRanges("chr1", IRanges(6920000, 6950000))
gr_tiles <- unlist(tile(gr0, width = 500))
gr_tiles
#> GRanges object with 61 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 6920000-6920490 *
#> [2] chr1 6920491-6920982 *
#> [3] chr1 6920983-6921474 *
#> [4] chr1 6921475-6921966 *
#> [5] chr1 6921967-6922458 *
#> ... ... ... ...
#> [57] chr1 6947541-6948032 *
#> [58] chr1 6948033-6948524 *
#> [59] chr1 6948525-6949016 *
#> [60] chr1 6949017-6949508 *
#> [61] chr1 6949509-6950000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# estimate noise parameters (typically should be performed on a larger,
# independent sample of genomic windows)
resNoiseEst <- estimateNoiseParsWindows(modbamfiles, modbase = "a",
windows = gr_tiles,
return_data = FALSE,
BPPARAM = BiocParallel::SerialParam())
se <- readModBam(bamfiles = modbamfiles,
regions = gr_tiles,
modbase = "a",
level = "summary",
trim = TRUE,
BPPARAM = BiocParallel::SerialParam()) |>
filterPositions(filters = "coverage",
minCov=2, assayNameNA = NULL)
# parametric noise estimation
se_snr_scores1 <- estimateSNRwindows(se, gr = gr_tiles,
noise_pars = resNoiseEst$coefficients)
# non-parametric noise estimation
se_snr_scores2 <- estimateSNRwindows(se, gr = gr_tiles)