Calculate the frequencies of same strand alignment distances,
for example from MNase-seq data to estimate nucleosome repeat length.
Distance calculations are implemented in C++ (calcAndCountDist)
for efficiency.
calcPhasogram(fname, regions = NULL, rmdup = TRUE, dmax = 3000L)character vector with one or several bam files. If
multiple files are given, distance counts from all will be summed.
GRanges object. Only alignments falling into these
regions will be used. If NULL (the default), all alignments are
used.
logical(1) indicating if duplicates should be removed. If
TRUE (the default), only one of several alignments starting at the
same coordinate is used.
numeric(1) specifying the maximal distance between same
strand alignments to count.
integer vector with dmax elements, with the element at
position d giving the observed number of alignment pairs at that
distance.
Phasograms were originally described in Valouev et al., Nature 2011 (doi:10.1038/nature10002). The implementation here differs in two ways from the original algorithms:
It does not implement removing of positions that have been seen less
than n times (referred to as a n-pile subset in the paper).
It does allow to retain only alignments that fall into selected
genomic intervals (regions argument).
estimateNRL to estimate the nucleosome repeat length
from a phasogram, plotPhasogram to visualize an annotated
phasogram, calcAndCountDist for low-level distance counting.
if (requireNamespace("GenomicAlignments", quietly = TRUE) &&
requireNamespace("Rsamtools", quietly = TRUE)) {
bamf <- system.file("extdata", "phasograms", "mnase_mm10.bam",
package = "swissknife")
pg <- calcPhasogram(bamf)
print(estimateNRL(pg, usePeaks = 1:4)[1:2])
plotPhasogram(pg, usePeaks = 1:4, xlim = c(0,1000))
}
#> $nrl
#> [1] 186
#>
#> $nrl.CI95
#> 2.5 % 97.5 %
#> 176.0015 195.9985
#>