Skip to contents

Calculate the frequencies of same-read modified base distances, for example from read-level modification data to estimate nucleosome repeat length. Distances are calculated separately for each sample (column in se), but if needed they can be easily combined for estimating NRL on a pool of samples by summing up observed counts (e.g. using Reduce("+", sampleDistsList). Distance calculations are implemented in C++ (calcAndCountDist) for efficiency.

Usage

calcModbaseSpacing(
  se,
  assay.type = "mod_prob",
  min_mod_prob = 0.5,
  pool_reads = TRUE,
  dmax = 1000L
)

Arguments

se

SummarizedExperiment object with read-level footprinting data, for example returned by readModBam. Rows should correspond to positions and columns to samples.

assay.type

A string or integer scalar specifying the assay of se containing the read-level modification probabilities.

min_mod_prob

Numeric scalar giving the minimal modification probability for a modified base.

pool_reads

Logical scalar indicating if reads within a sample should be pooled. If TRUE (the default), distances from reads within a sample are combined and returned as a vector. If FALSE, distances obtained from each read in a sample are returned separately as columns in a matrix.

dmax

Numeric scalar specifying the maximal distance between modified bases on the same read to count.

Value

A named list of length ncol(se) (one element for each sample). If pool_reads=TRUE, the elements are integer vectors of length dmax, with the value at position d giving the observed number of within-read modified base pairs at distance d. If pool_reads=FALSE, each list element is a matrix with dmax rows and individual reads in columns, with the value at row d and column r giving the observed number of modified base pairs at distance d for read r.

References

Phasograms were originally described in Valouev et al., Nature 2011 (doi:10.1038/nature10002). The implementation here differs in three ways from the original algorithms:

  1. Instead of same strand alignment start positions, this function is adapted to single-molecule footprinting data and measures the distances between same-read modified base positions.

  2. It does not implement removing of positions that have been seen less than n times (referred to as a n-pile subset in the paper).

  3. It does allow to retain only alignments that fall into selected genomic intervals (regions argument).

See also

estimateNRL to estimate the nucleosome repeat length from a phasogram, plotModbaseSpacing to visualize an annotated phasogram, calcAndCountDist for low-level distance counting.

Author

Michael Stadler

Examples

modbamfiles <- system.file("extdata",
                           c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
se <- readModBam(modbamfiles, "chr1:6940000-6955000", "a")

# get distances for each sample
moddist <- calcModbaseSpacing(se)

# analyze NRL for sample 's1'
print(estimateNRL(moddist$s1)[1:2])
#> $nrl
#> [1] 182.6
#> 
#> $nrl.CI95
#>    2.5 %   97.5 % 
#> 179.5475 185.6525 
#> 
plotModbaseSpacing(moddist$s1)

plotModbaseSpacing(moddist$s1, detailedPlots = TRUE)


# combine samples
moddistComb <- Reduce("+", moddist)
print(estimateNRL(moddistComb)[1:2])
#> $nrl
#> [1] 183.9
#> 
#> $nrl.CI95
#>    2.5 %   97.5 % 
#> 179.8787 187.9213 
#>