Calculate distances between modified bases on individual reads
Source:R/modbaseSpacing.R
calcModbaseSpacing.RdCalculate the frequencies of same-read modified base distances,
for example from read-level modification data to estimate nucleosome
repeat length. Distances are calculated separately for each sample
(column in se), but if needed they can be easily combined for
estimating NRL on a pool of samples by summing up observed counts
(e.g. using Reduce("+", sampleDistsList). Distance calculations
are implemented in C++ (calcAndCountDist) for efficiency.
Usage
calcModbaseSpacing(
se,
assayName = "mod_prob",
minModProb = 0.5,
poolReads = TRUE,
dmax = 1000L
)Arguments
- se
SummarizedExperimentobject with read-level footprinting data, for example returned byreadModBam. Rows should correspond to positions and columns to samples.- assayName
A character scalar specifying the assay of
secontaining the read-level modification probabilities.- minModProb
Numeric scalar giving the minimal modification probability for a modified base.
- poolReads
Logical scalar indicating if reads within a sample should be pooled. If
TRUE(the default), distances from reads within a sample are combined and returned as a vector. IfFALSE, distances obtained from each read in a sample are returned separately as columns in a matrix.- dmax
Numeric scalar specifying the maximal distance between modified bases on the same read to count.
Value
A named list of length ncol(se) (one element for each sample).
If poolReads=TRUE, the elements are integer vectors of
length dmax, with the value at position d giving the
observed number of within-read modified base pairs at distance d.
If poolReads=FALSE, each list element is a matrix with
dmax rows and individual reads in columns, with the value at
row d and column r giving the observed number of modified
base pairs at distance d for read r.
References
Phasograms were originally described in Valouev et al., Nature 2011 (doi:10.1038/nature10002). The implementation here differs in three ways from the original algorithms:
Instead of same strand alignment start positions, this function is adapted to single-molecule footprinting data and measures the distances between same-read modified base positions.
It does not implement removing of positions that have been seen less than
ntimes (referred to as an-pile subset in the paper).It does allow to retain only alignments that fall into selected genomic intervals (
regionsargument).
See also
estimateNRL to estimate the nucleosome repeat length
from a phasogram, plotModbaseSpacing to visualize an annotated
phasogram, calcAndCountDist for low-level distance counting.
Examples
modbamfiles <- system.file("extdata", "6mA_1_10reads.bam", package = "footprintR")
se <- readModBam(modbamfiles, "chr1:6940000-6955000", "a",
BPPARAM = BiocParallel::SerialParam())
# get distances
moddist <- calcModbaseSpacing(se)
str(moddist)
#> List of 1
#> $ s1: Named num [1:1000] 175 110 103 70 83 99 111 93 77 68 ...
#> ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...