5 Nucleosome analyses

A prominent source of signal in single molecule footprinting data are nucleosomes that protect the DNA that is wound around the histones but leave the DNA between nucleosome particles, the linkers, accessible to modifications.

In this chapter, we illustrate how footprintR can be used to place single nucleosomes onto individual reads (see Section 5.3) or estimate the average distance between neighboring nucleosomes (Section 5.4).

5.1 Preparation

We first load the packages and the genome needed for these tasks.

Show/hide code

BSgenomeName <- "BSgenome.Mmusculus.GENCODE.GRCm39.gencodeM34"

library(footprintR)
library(ggplot2)
library(patchwork)
library(GenomicRanges)
library(SummarizedExperiment)
library(BSgenomeName, character.only = TRUE)

# Load genome
gnm <- get(BSgenomeName)
genome(gnm) <- "mm39"

5.2 Read data

Our data is obtained from modBam file from a wild-type sample, for which 6mA modification calling has been performed. We use the readModBam function to read data from an 2000-bp region around a CTCF binding site on chromosome 8 and store it in a SummarizedExperiment object. For more information about reading data with footprintR, see Chapter 2.

Show/hide code

# Read data
ctcfsite <- as("chr8:6622465-6622483", "GRanges")
reg <- resize(ctcfsite, width = 2000, fix = "center")

se <- readModBam(
    bamfiles = c(wt2 = "data/mESC_wt_6mA_rep2.bam"),
    modbase = "a", 
    regions = reg,
    seqinfo = seqinfo(gnm), 
    sequenceContextWidth = 1, 
    sequenceReference = gnm
)

# add summary-level assays
se <- flattenReadLevelAssay(se)

5.3 Placing nucleosome footprints

The calcFootprintScores function uses a vector of weights to identify high-scoring regions on individual reads that show the modification pattern given in these weights. The score of a region on a read measures how well modification probabilities correlate with weights in the weight vector. High-scoring footprints that score above a provided threshold can be added to our se object using addFootprints, which will call calcFootprintScores internally.

In order to find nucleosomes, we construct a weight vector called wgt of a total length of 170 bp, consisting of a central 140 bp region in which we expect low modifications (the region protected by the nucleosome) flanked by two linker regions of 15 bp each in which we expect high modification. We adjust the weights in the flanks and central region such that the overall mean is close to zero:

Show/hide code

# nucleosome footprint
wgt <- rep(c(0.5, -0.5, 0.5) * c(140/170, 30/170, 140/170), c(15, 140, 15))
mean(wgt)

[1] -8.115253e-18

Show/hide code

ggplot(data.frame(Position = seq_along(wgt), Weight = wgt),
       aes(Position, Weight)) +
    geom_col() +
    geom_hline(yintercept = 0, linetype = "dashed") +
    theme_bw()

Now we can scan the reads for high-scoring footprints and add them to the colData of our SummarizedExperiment object, under a column name given by the name argument:

Show/hide code

se <- addFootprints(se = se, wgt = wgt, thresh = 0.03, name = "nucl")

ℹ calculating footprint scores

✔ calculating footprint scores [485ms]

ℹ segmenting footprint scores

✔ segmenting footprint scores [1.1s]

The footprints are stored as a list with one element for each sample, containing an IRangesList. Its elements are the reads of that sample, and the ranges correspond to individual footprints:

Show/hide code

se$nucl

$wt2
IRangesList object of length 62:
$`wt2-2b63fef6-6d5c-4aa4-b6a1-aebd5390cc83`
IRanges object with 37 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]   6610367   6610506       140
   [2]   6610900   6611039       140
   [3]   6611097   6611236       140
   [4]   6611677   6611816       140
   [5]   6612665   6612804       140
   ...       ...       ...       ...
  [33]   6622587   6622726       140
  [34]   6622762   6622901       140
  [35]   6623311   6623450       140
  [36]   6623890   6624029       140
  [37]   6624088   6624227       140

...
<61 more elements>

The nucleosome footprints can be visualized using plotRegion:

Show/hide code

plotRegion(se, region = reg,
           tracks = list(list(trackData="mod_prob", trackType="Heatmap",
                              interpolate = TRUE,
                              footprintColumns = "nucl",
                              arglistFootprints = list(nucl = list(color = alpha("firebrick", 0.5))),
                              orderReads = "squish"),
                         list(trackData = "FracMod", trackType = "Smooth",
                              smoothMethod="rollingMean", windowSize=41,
                              highlightRegions = ctcfsite)),
                    sequenceContext = "A") +
        patchwork::plot_layout(heights = c(2, 0.6))

5.4 Estimating nucleosome repeat length

Placing of individual nucleosomes in principle allows also measuring their average distance, the nucleosome repeat length (NRL). This measure can also be obtained without placing of nucleosomes, from the distribution of distances between modified bases in which multiples of the NRL are over-represented. This idea is related to the phasogram analysis described by Valouev et al. (2011).

In footprintR, this analysis can be performed by first calculating the distribution of distances between modified bases using calcModbaseSpacing:

Show/hide code

moddist <- calcModbaseSpacing(se)
str(moddist)

List of 1
 $ wt2: Named num [1:1000] 3513 2262 1789 1296 1176 ...
  ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...

From this distribution, which was obtained from just 62 reads, we can accurately estimate the NRL using estimateNRL:

Show/hide code

res <- estimateNRL(x = moddist$wt2)
res[1:2]

$nrl
[1] 190.3

$nrl.CI95
   2.5 %   97.5 % 
187.6821 192.9179

Alternatively, the distance distribution can also be visualized together with an estimate of the NRL using plotModbaseSpacing, either as a summary plot:

Show/hide code

plotModbaseSpacing(x = moddist$wt2)

… or as a set of three plots that illustrate the different steps of the estimation:

Show/hide code

plotModbaseSpacing(x = moddist$wt2, detailedPlots = TRUE)

5.5 Quantifying nucleosome phasing

Nucleosomes that occupy similar positions across reads are called phased and can result for example from a sequence-specific DNA binding protein that constrains their movement along the DNA. Genomic regions with phased nucleosomes can be identified on the summary-level data by phasingScoreFourier, which calculates the strength of a periodic component in the summary-level data corresponding to the expected period of nucleosomes (see Section 5.4 for how that period can be estimated).

We start such an analysis by first reading summary-level data for a larger genomic region, in which we want to calculate phasing scores:

Show/hide code

reg <- GRanges("chr8", IRanges(6637500, 6642500))
se <- readModBam(
    bamfiles = c(wt2 = "data/mESC_wt_6mA_rep2.bam"),
    modbase = "a", 
    regions = reg,
    level = "summary",
    trim = TRUE,
    seqinfo = seqinfo(gnm), 
    sequenceContextWidth = 1, 
    sequenceReference = gnm
)

Next, we construct windowgr, a GRanges objects defining the windows in reg for which we want to calculate a nucleosome phasing score:

Show/hide code

windowStep <- 190
windowSize <- 4 * windowStep
s <- seq(start(reg), end(reg) - (windowSize - windowStep) + 1, by = windowStep)
windowgr <- GRanges(seqnames = seqnames(reg),
                    ranges = IRanges(start = s, width = windowSize))

Finally, we calculate phasing socres for the windows in windowgr. The windowSize and the value of the numCoef argument define the period of interest \(poi = windowSize / (numCoef - 1)\) (here: 190 bp, see phasingScoreFourier for details):

Show/hide code

seFourier <- phasingScoreFourier(se = se, gr = windowgr, numCoef = 5)

We can plot the window scores together with the fraction of modification. Note that the sequential windows for which the phasing score was calculated are overlapping (the window with the maximal phasing score is indicated by the red line below):

Show/hide code

# get plotting data using plotRegion
p <- plotRegion(se, region = reg,
                tracks = list(list(trackData = "FracMod", trackType = "Smooth",
                                   smoothMethod = "rollingMean", windowSize = 101)))
pd <- p$layers[[1]]$data

# get phasing scores
pd2 <- as.data.frame(rowRanges(seFourier))
pd2$position <- mid(rowRanges(seFourier))
pd2$phasingScoreAbs <- assay(seFourier, "phasingScoreAbs")[, "wt2"]

# visualize
ggplot(mapping = aes(x = position)) +
    geom_tile(data = pd2,
              mapping = aes(y = mean(pd$value_smooth), fill = phasingScoreAbs),
              width = windowStep, height = Inf, alpha = 0.4) +
    scale_fill_viridis_c() +
    geom_segment(data = pd2[which.max(pd2$phasingScoreAbs), , drop = FALSE],
                 inherit.aes = FALSE,
                 mapping = aes(x = start, xend = end, y = min(pd$value_smooth)),
                 colour = "red", linewidth = 2) +
    geom_line(data = pd,
              mapping = aes(y = value_smooth)) +
    ylim(range(pd$value_smooth)) +
    labs(x = paste0("Position on ", seqnames(reg), " (bp)"),
         y = "Fraction of modified bases",
         fill = paste0("Phasing score\n", windowStep, " bp period")) +
    theme_bw() +
    theme(legend.position = "bottom")

5.6 Session info

Click to view session info

Show/hide code

sessioninfo::session_info(info = "packages")

═ Session info ═══════════════════════════════════════════════════════════════
─ Packages ───────────────────────────────────────────────────────────────────
 package                                      * version   date (UTC) lib source
 abind                                          1.4-8     2024-09-12 [1] CRAN (R 4.5.0)
 Biobase                                      * 2.68.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 BiocGenerics                                 * 0.54.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 BiocIO                                       * 1.18.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 BiocParallel                                   1.42.1    2025-06-01 [1] Bioconductor 3.21 (R 4.5.0)
 Biostrings                                   * 2.76.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 bitops                                         1.0-9     2024-10-03 [1] CRAN (R 4.5.0)
 BSgenome                                     * 1.76.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 BSgenome.Mmusculus.GENCODE.GRCm39.gencodeM34 * 0.1.0     2025-04-17 [1] Bioconductor
 class                                          7.3-23    2025-01-01 [2] CRAN (R 4.5.0)
 cli                                            3.6.5     2025-04-23 [1] CRAN (R 4.5.0)
 codetools                                      0.2-20    2024-03-31 [2] CRAN (R 4.5.0)
 crayon                                         1.5.3     2024-06-20 [1] CRAN (R 4.5.0)
 curl                                           6.3.0     2025-06-06 [1] CRAN (R 4.5.0)
 data.table                                     1.17.4    2025-05-26 [1] CRAN (R 4.5.0)
 DelayedArray                                   0.34.1    2025-04-17 [1] Bioconductor 3.21 (R 4.5.0)
 dichromat                                      2.0-0.1   2022-05-02 [1] CRAN (R 4.5.0)
 digest                                         0.6.37    2024-08-19 [1] CRAN (R 4.5.0)
 dplyr                                          1.1.4     2023-11-17 [1] CRAN (R 4.5.0)
 e1071                                          1.7-16    2024-09-16 [1] CRAN (R 4.5.0)
 evaluate                                       1.0.3     2025-01-10 [1] CRAN (R 4.5.0)
 farver                                         2.1.2     2024-05-13 [1] CRAN (R 4.5.0)
 fastmap                                        1.2.0     2024-05-15 [1] CRAN (R 4.5.0)
 footprintR                                   * 0.3.5     2025-06-04 [1] Github (fmicompbio/footprintR@612f713)
 generics                                     * 0.1.4     2025-05-09 [1] CRAN (R 4.5.0)
 GenomeInfoDb                                 * 1.44.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 GenomeInfoDbData                               1.2.14    2025-04-17 [1] Bioconductor
 GenomicAlignments                              1.44.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 GenomicRanges                                * 1.60.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 ggforce                                        0.4.2     2024-02-19 [1] CRAN (R 4.5.0)
 ggplot2                                      * 3.5.2     2025-04-09 [1] CRAN (R 4.5.0)
 glue                                           1.8.0     2024-09-30 [1] CRAN (R 4.5.0)
 gtable                                         0.3.6     2024-10-25 [1] CRAN (R 4.5.0)
 htmltools                                      0.5.8.1   2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets                                    1.6.4     2023-12-06 [1] CRAN (R 4.5.0)
 httr                                           1.4.7     2023-08-15 [1] CRAN (R 4.5.0)
 IRanges                                      * 2.42.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 jsonlite                                       2.0.0     2025-03-27 [1] CRAN (R 4.5.0)
 knitr                                          1.50      2025-03-16 [1] CRAN (R 4.5.0)
 labeling                                       0.4.3     2023-08-29 [1] CRAN (R 4.5.0)
 lattice                                        0.22-7    2025-04-02 [1] CRAN (R 4.5.0)
 lifecycle                                      1.0.4     2023-11-07 [1] CRAN (R 4.5.0)
 magrittr                                       2.0.3     2022-03-30 [1] CRAN (R 4.5.0)
 MASS                                           7.3-65    2025-02-28 [2] CRAN (R 4.5.0)
 Matrix                                         1.7-3     2025-03-11 [2] CRAN (R 4.5.0)
 MatrixGenerics                               * 1.20.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 matrixStats                                  * 1.5.0     2025-01-07 [1] CRAN (R 4.5.0)
 patchwork                                    * 1.3.0     2024-09-16 [1] CRAN (R 4.5.0)
 pillar                                         1.10.2    2025-04-05 [1] CRAN (R 4.5.0)
 pkgconfig                                      2.0.3     2019-09-22 [1] CRAN (R 4.5.0)
 polyclip                                       1.10-7    2024-07-23 [1] CRAN (R 4.5.0)
 proxy                                          0.4-27    2022-06-09 [1] CRAN (R 4.5.0)
 purrr                                          1.0.4     2025-02-05 [1] CRAN (R 4.5.0)
 R6                                             2.6.1     2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer                                   1.1-3     2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                                           1.0.14    2025-01-12 [1] CRAN (R 4.5.0)
 RCurl                                          1.98-1.17 2025-03-22 [1] CRAN (R 4.5.0)
 restfulr                                       0.0.15    2022-06-16 [1] CRAN (R 4.5.0)
 rjson                                          0.2.23    2024-09-16 [1] CRAN (R 4.5.0)
 rlang                                          1.1.6     2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown                                      2.29      2024-11-04 [1] CRAN (R 4.5.0)
 Rsamtools                                      2.24.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 rtracklayer                                  * 1.68.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 S4Arrays                                       1.8.1     2025-06-01 [1] Bioconductor 3.21 (R 4.5.0)
 S4Vectors                                    * 0.46.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 scales                                         1.4.0     2025-04-24 [1] CRAN (R 4.5.0)
 sessioninfo                                    1.2.3     2025-02-05 [1] CRAN (R 4.5.0)
 signal                                         1.8-1     2024-06-26 [1] CRAN (R 4.5.0)
 SparseArray                                    1.8.0     2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 SummarizedExperiment                         * 1.38.1    2025-04-30 [1] Bioconductor 3.21 (R 4.5.0)
 tibble                                         3.3.0     2025-06-08 [1] CRAN (R 4.5.0)
 tidyr                                          1.3.1     2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect                                     1.2.1     2024-03-11 [1] CRAN (R 4.5.0)
 tweenr                                         2.0.3     2024-02-26 [1] CRAN (R 4.5.0)
 UCSC.utils                                     1.4.0     2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 vctrs                                          0.6.5     2023-12-01 [1] CRAN (R 4.5.0)
 viridisLite                                    0.4.2     2023-05-02 [1] CRAN (R 4.5.0)
 withr                                          3.0.2     2024-10-28 [1] CRAN (R 4.5.0)
 xfun                                           0.52      2025-04-02 [1] CRAN (R 4.5.0)
 XML                                            3.99-0.18 2025-01-01 [1] CRAN (R 4.5.0)
 XVector                                      * 0.48.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
 yaml                                           2.3.10    2024-07-26 [1] CRAN (R 4.5.0)
 zoo                                            1.8-14    2025-04-10 [1] CRAN (R 4.5.0)

 [1] /tungstenfs/groups/gbioinfo/Appz/R/BioC/R-4.5-release-foss-2024.05_BioC-3.21-release-foss-2024.05
 [2] /tachyon/groups/gbioinfo/Appz/easybuild/software/R/4.5.0-foss-2024.05/lib64/R/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────