5  Nucleosome analyses

A prominent source of signal in single molecule footprinting data are nucleosomes that protect the DNA that is wound around the histones but leave the DNA between nucleosome particles, the linkers, accessible to modifications.

In this chapter, we illustrate how footprintR can be used to place single nucleosomes onto individual reads (see Section 5.3) or estimate the average distance between neighboring nucleosomes (Section 5.4).

5.1 Preparation

We first load the packages and the genome needed for these tasks.

Show/hide code
BSgenomeName <- "BSgenome.Mmusculus.GENCODE.GRCm39.gencodeM34"

library(SingleMoleculeGenomicsIO)
library(footprintR)
library(ggplot2)
library(patchwork)
library(GenomicRanges)
library(SummarizedExperiment)
library(BSgenomeName, character.only = TRUE)

# Load genome
gnm <- get(BSgenomeName)
genome(gnm) <- "mm39"

5.2 Read data

Our data is obtained from modBam file from a wild-type sample, for which 6mA modification calling has been performed. We use the readModBam function to read data from an 2000-bp region around a CTCF binding site on chromosome 8 and store it in a SummarizedExperiment object. For more information about reading data with footprintR, see Chapter 2.

Show/hide code
# Read data
ctcfsite <- as("chr8:6622465-6622483", "GRanges")
reg <- resize(ctcfsite, width = 2000, fix = "center")

se <- readModBam(
    bamfiles = c(wt1 = "data/mESC_wt_6mA_rep1.bam"),
    modbase = "a", 
    regions = reg,
    seqinfo = seqinfo(gnm), 
    sequenceContextWidth = 1, 
    sequenceReference = gnm
)

# add summary-level assays
se <- flattenReadLevelAssay(se)

5.3 Placing nucleosome footprints

The calcFootprintScores function uses a vector of weights to identify high-scoring regions on individual reads that show the modification pattern given in these weights. The score of a region on a read measures how well modification probabilities correlate with weights in the weight vector. High-scoring footprints that score above a provided threshold can be added to our se object using addFootprints, which will call calcFootprintScores internally.

In order to find nucleosomes, we construct a weight vector called wgt of a total length of 160 bp, consisting of a central 120 bp region in which we expect low modifications (the region protected by the nucleosome) flanked by two linker regions of 20 bp each in which we expect high modification. We adjust the weights in the flanks and central region such that the overall mean is close to zero:

Show/hide code
# nucleosome footprint
wgt <- rep(c(0.5, -0.5, 0.5) * c(120/160, 40/160, 120/160), c(20, 120, 20))
mean(wgt)
[1] 0
Show/hide code
ggplot(data.frame(Position = seq_along(wgt), Weight = wgt),
       aes(Position, Weight)) +
    geom_col() +
    geom_hline(yintercept = 0, linetype = "dashed") +
    theme_bw()
Figure 5.1

Now we can scan the reads for high-scoring footprints and add them to the colData of our SummarizedExperiment object, under a column name given by the name argument:

Show/hide code
se <- addFootprints(se = se, wgt = wgt, thresh = 0.02, name = "nucl")
ℹ calculating footprint scores
✔ calculating footprint scores [1.6s]
ℹ segmenting footprint scores
✔ segmenting footprint scores [1.1s]

The footprints are stored as a list with one element for each sample, containing an IRangesList. Its elements are the reads of that sample, and the ranges correspond to individual footprints:

Show/hide code
se$nucl
$wt1
IRangesList object of length 54:
$`wt1-2b63fef6-6d5c-4aa4-b6a1-aebd5390cc83`
IRanges object with 65 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]   6610366   6610505       140
   [2]   6610543   6610682       140
   [3]   6610707   6610846       140
   [4]   6610901   6611040       140
   [5]   6611097   6611236       140
   ...       ...       ...       ...
  [61]   6623137   6623276       140
  [62]   6623312   6623451       140
  [63]   6623713   6623852       140
  [64]   6623884   6624023       140
  [65]   6624088   6624227       140

...
<53 more elements>
Show/hide code
# total number of nucleosomes
sum(lengths(se$nucl$wt1))
[1] 1727

The nucleosome footprints can be visualized using plotRegion:

Show/hide code
plotRegion(se, region = reg, minCoveredFraction = 0.7,
           tracks = list(list(trackData="mod_prob", trackType="Heatmap",
                              interpolate = TRUE,
                              footprintColumns = "nucl",
                              arglistFootprints = list(
                                  nucl = list(color = alpha("firebrick", 0.5))),
                              orderReads = "squish"),
                         list(trackData = "FracMod", trackType = "Smooth",
                              smoothMethod="rollingMean", windowSize=41,
                              highlightRegions = ctcfsite)),
                    sequenceContext = "A") +
        patchwork::plot_layout(heights = c(2, 0.6))
Figure 5.2

5.4 Estimating nucleosome repeat length

Placing of individual nucleosomes in principle allows also measuring their average distance, the nucleosome repeat length (NRL). This measure can also be obtained without placing of nucleosomes, from the distribution of distances between modified bases in which multiples of the NRL are over-represented. This idea is related to the phasogram analysis described by Valouev et al. (2011).

In footprintR, this analysis can be performed by first calculating the distribution of distances between modified bases using calcModbaseSpacing:

Show/hide code
moddist <- calcModbaseSpacing(se)
str(moddist)
List of 1
 $ wt1: Named num [1:1000] 3306 2143 1670 1218 1122 ...
  ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...

From this distribution, which was obtained from just 54 reads, we can accurately estimate the NRL using estimateNRL:

Show/hide code
res <- estimateNRL(x = moddist$wt1)
res[1:2]
$nrl
[1] 190.1

$nrl.CI95
   2.5 %   97.5 % 
187.3074 192.8926 

Alternatively, the distance distribution can also be visualized together with an estimate of the NRL using plotModbaseSpacing, either as a summary plot:

Show/hide code
plotModbaseSpacing(x = moddist$wt1)
Figure 5.3

… or as a set of three plots that illustrate the different steps of the estimation:

Show/hide code
plotModbaseSpacing(x = moddist$wt1, detailedPlots = TRUE)
Figure 5.4

5.5 Read autocorrelation also shows nucleosomal signal

An alternative way to visualize or measure nucleosome repeat length is to plot the autocorrelation of reads. This is obtained by shifting the reads relative to themselves and calculating the pearson correlation of the modification vector in the overlapping range. As you can see in the plot below, this shows high correlation for the first ~50 nucleotides (corresponding to the average length of a nucleosome linker or footprint), then switches to negative correlations before rising again and peaking at about 190 nucleotides (the average nucleosome repeat length).

Show/hide code
plotReadAutocorrelation(se, refDists = c(50, 147, 190)) +
    theme(legend.position = "inside",
          legend.position.inside = c(.98, .98),
          legend.justification = c(1, 1))
Figure 5.5

5.6 Quantifying nucleosome phasing

Nucleosomes that occupy similar positions across reads are called phased and can result for example from a sequence-specific DNA binding protein that constrains their movement along the DNA. Genomic regions with phased nucleosomes can be identified on the summary-level data by phasingScoreFourier, which calculates the strength of a periodic component in the summary-level data corresponding to the expected period of nucleosomes (see Section 5.4 for how that period can be estimated).

We start such an analysis by first reading summary-level data for a larger genomic region, in which we want to calculate phasing scores:

Show/hide code
reg <- GRanges("chr8", IRanges(6637500, 6642500))
se <- readModBam(
    bamfiles = c(wt1 = "data/mESC_wt_6mA_rep1.bam"),
    modbase = "a", 
    regions = reg,
    level = "summary",
    trim = TRUE,
    seqinfo = seqinfo(gnm), 
    sequenceContextWidth = 1, 
    sequenceReference = gnm
)

Next, we construct windowgr, a GRanges objects defining the windows in reg for which we want to calculate a nucleosome phasing score:

Show/hide code
windowStep <- 190
windowSize <- 4 * windowStep
s <- seq(start(reg), end(reg) - (windowSize - windowStep) + 1, by = windowStep)
windowgr <- GRanges(seqnames = seqnames(reg),
                    ranges = IRanges(start = s, width = windowSize))

Finally, we calculate phasing socres for the windows in windowgr. The windowSize and the value of the numCoef argument define the period of interest \(poi = windowSize / (numCoef - 1)\) (here: 190 bp, see phasingScoreFourier for details):

Show/hide code
seFourier <- phasingScoreFourier(se = se, gr = windowgr, numCoef = 5)

We can plot the window scores together with the fraction of modification. Note that the sequential windows for which the phasing score was calculated are overlapping (the window with the maximal phasing score is indicated by the red line below):

Show/hide code
# get plotting data using plotRegion
p <- plotRegion(se, region = reg,
                tracks = list(list(trackData = "FracMod", trackType = "Smooth",
                                   smoothMethod = "rollingMean", windowSize = 101)))
pd <- p$layers[[1]]$data

# get phasing scores
pd2 <- as.data.frame(rowRanges(seFourier))
pd2$position <- mid(rowRanges(seFourier))
pd2$phasingScoreAbs <- assay(seFourier, "phasingScoreAbs")[, "wt1"]

# visualize
ggplot(mapping = aes(x = position)) +
    geom_tile(data = pd2,
              mapping = aes(y = mean(pd$value_smooth), fill = phasingScoreAbs),
              width = windowStep, height = Inf, alpha = 0.4) +
    scale_fill_viridis_c() +
    geom_segment(data = pd2[which.max(pd2$phasingScoreAbs), , drop = FALSE],
                 inherit.aes = FALSE,
                 mapping = aes(x = start, xend = end, y = min(pd$value_smooth)),
                 colour = "red", linewidth = 2) +
    geom_line(data = pd,
              mapping = aes(y = value_smooth)) +
    ylim(range(pd$value_smooth)) +
    labs(x = paste0("Position on ", seqnames(reg), " (bp)"),
         y = "Fraction of modified bases",
         fill = paste0("Phasing score\n", windowStep, " bp period")) +
    theme_bw() +
    theme(legend.position = "bottom")
Figure 5.6

5.7 Session info

Click to view session info
Show/hide code
sessioninfo::session_info(info = "packages")
═ Session info ═══════════════════════════════════════════════════════════════════════════════════════════════════════
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package                                      * version   date (UTC) lib source
 abind                                          1.4-8     2024-09-12 [1] CRAN (R 4.5.1)
 basilisk                                       1.22.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 Biobase                                      * 2.70.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocGenerics                                 * 0.56.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocIO                                       * 1.20.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocParallel                                   1.44.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 Biostrings                                   * 2.78.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 bitops                                         1.0-9     2024-10-03 [1] CRAN (R 4.5.1)
 BSgenome                                     * 1.78.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BSgenome.Mmusculus.GENCODE.GRCm39.gencodeM34 * 0.1.0     2025-10-31 [1] Bioconductor
 cigarillo                                      1.0.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 class                                          7.3-23    2025-01-01 [2] CRAN (R 4.5.2)
 cli                                            3.6.5     2025-04-23 [1] CRAN (R 4.5.1)
 codetools                                      0.2-20    2024-03-31 [2] CRAN (R 4.5.2)
 crayon                                         1.5.3     2024-06-20 [1] CRAN (R 4.5.1)
 curl                                           7.0.0     2025-08-19 [1] CRAN (R 4.5.1)
 data.table                                     1.18.0    2025-12-24 [1] CRAN (R 4.5.2)
 DelayedArray                                   0.36.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 dichromat                                      2.0-0.1   2022-05-02 [1] CRAN (R 4.5.1)
 digest                                         0.6.39    2025-11-19 [1] CRAN (R 4.5.2)
 dir.expiry                                     1.18.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 dplyr                                          1.1.4     2023-11-17 [1] CRAN (R 4.5.1)
 e1071                                          1.7-17    2025-12-18 [1] CRAN (R 4.5.2)
 evaluate                                       1.0.5     2025-08-27 [1] CRAN (R 4.5.1)
 farver                                         2.1.2     2024-05-13 [1] CRAN (R 4.5.1)
 fastmap                                        1.2.0     2024-05-15 [1] CRAN (R 4.5.1)
 filelock                                       1.0.3     2023-12-11 [1] CRAN (R 4.5.1)
 footprintR                                   * 0.4.0     2026-01-23 [1] Bioconductor
 generics                                     * 0.1.4     2025-05-09 [1] CRAN (R 4.5.1)
 GenomeInfoDb                                 * 1.46.2    2025-12-04 [1] Bioconductor 3.22 (R 4.5.2)
 GenomicAlignments                              1.46.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 GenomicRanges                                * 1.62.1    2025-12-08 [1] Bioconductor 3.22 (R 4.5.2)
 ggforce                                        0.5.0     2025-06-18 [1] CRAN (R 4.5.1)
 ggplot2                                      * 4.0.1     2025-11-14 [1] CRAN (R 4.5.2)
 glue                                           1.8.0     2024-09-30 [1] CRAN (R 4.5.1)
 gtable                                         0.3.6     2024-10-25 [1] CRAN (R 4.5.1)
 htmltools                                      0.5.9     2025-12-04 [1] CRAN (R 4.5.2)
 htmlwidgets                                    1.6.4     2023-12-06 [1] CRAN (R 4.5.1)
 httr                                           1.4.7     2023-08-15 [1] CRAN (R 4.5.1)
 IRanges                                      * 2.44.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 janeaustenr                                    1.0.0     2022-08-26 [1] CRAN (R 4.5.2)
 jsonlite                                       2.0.0     2025-03-27 [1] CRAN (R 4.5.1)
 knitr                                          1.51      2025-12-20 [1] CRAN (R 4.5.2)
 labeling                                       0.4.3     2023-08-29 [1] CRAN (R 4.5.1)
 lattice                                        0.22-7    2025-04-02 [2] CRAN (R 4.5.2)
 lifecycle                                      1.0.5     2026-01-08 [1] CRAN (R 4.5.2)
 magrittr                                       2.0.4     2025-09-12 [1] CRAN (R 4.5.1)
 MASS                                           7.3-65    2025-02-28 [2] CRAN (R 4.5.2)
 Matrix                                         1.7-4     2025-08-28 [2] CRAN (R 4.5.2)
 MatrixGenerics                               * 1.22.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 matrixStats                                  * 1.5.0     2025-01-07 [1] CRAN (R 4.5.1)
 otel                                           0.2.0     2025-08-29 [1] CRAN (R 4.5.1)
 patchwork                                    * 1.3.2     2025-08-25 [1] CRAN (R 4.5.1)
 pillar                                         1.11.1    2025-09-17 [1] CRAN (R 4.5.1)
 pkgconfig                                      2.0.3     2019-09-22 [1] CRAN (R 4.5.1)
 png                                            0.1-8     2022-11-29 [1] CRAN (R 4.5.1)
 polyclip                                       1.10-7    2024-07-23 [1] CRAN (R 4.5.1)
 proxy                                          0.4-29    2025-12-29 [1] CRAN (R 4.5.2)
 purrr                                          1.2.1     2026-01-09 [1] CRAN (R 4.5.2)
 R6                                             2.6.1     2025-02-15 [1] CRAN (R 4.5.1)
 RColorBrewer                                   1.1-3     2022-04-03 [1] CRAN (R 4.5.1)
 Rcpp                                           1.1.1     2026-01-10 [1] CRAN (R 4.5.2)
 RCurl                                          1.98-1.17 2025-03-22 [1] CRAN (R 4.5.1)
 restfulr                                       0.0.16    2025-06-27 [1] CRAN (R 4.5.1)
 reticulate                                     1.44.1    2025-11-14 [1] CRAN (R 4.5.2)
 rjson                                          0.2.23    2024-09-16 [1] CRAN (R 4.5.1)
 rlang                                          1.1.7     2026-01-09 [1] CRAN (R 4.5.2)
 rmarkdown                                      2.30      2025-09-28 [1] CRAN (R 4.5.1)
 Rsamtools                                      2.26.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 rtracklayer                                  * 1.70.1    2025-12-22 [1] Bioconductor 3.22 (R 4.5.2)
 S4Arrays                                       1.10.1    2025-12-02 [1] Github (Bioconductor/S4Arrays@a4cccba)
 S4Vectors                                    * 0.48.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 S7                                             0.2.1     2025-11-14 [1] CRAN (R 4.5.2)
 scales                                         1.4.0     2025-04-24 [1] CRAN (R 4.5.1)
 Seqinfo                                      * 1.0.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 sessioninfo                                    1.2.3     2025-02-05 [1] CRAN (R 4.5.1)
 signal                                         1.8-1     2024-06-26 [1] CRAN (R 4.5.1)
 SingleMoleculeGenomicsIO                     * 0.1.0     2026-01-19 [1] Bioconductor
 SnowballC                                      0.7.1     2023-04-25 [1] CRAN (R 4.5.2)
 SparseArray                                    1.10.8    2025-12-18 [1] Bioconductor 3.22 (R 4.5.2)
 stringi                                        1.8.7     2025-03-27 [1] CRAN (R 4.5.1)
 SummarizedExperiment                         * 1.40.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 tibble                                         3.3.1     2026-01-11 [1] CRAN (R 4.5.2)
 tidyr                                          1.3.2     2025-12-19 [1] CRAN (R 4.5.2)
 tidyselect                                     1.2.1     2024-03-11 [1] CRAN (R 4.5.1)
 tidytext                                       0.4.3     2025-07-25 [1] CRAN (R 4.5.2)
 tokenizers                                     0.3.0     2022-12-22 [1] CRAN (R 4.5.2)
 tweenr                                         2.0.3     2024-02-26 [1] CRAN (R 4.5.1)
 UCSC.utils                                     1.6.1     2025-12-11 [1] Bioconductor 3.22 (R 4.5.2)
 vctrs                                          0.6.5     2023-12-01 [1] CRAN (R 4.5.1)
 viridisLite                                    0.4.2     2023-05-02 [1] CRAN (R 4.5.1)
 withr                                          3.0.2     2024-10-28 [1] CRAN (R 4.5.1)
 xfun                                           0.55      2025-12-16 [1] CRAN (R 4.5.2)
 XML                                            3.99-0.20 2025-11-08 [1] CRAN (R 4.5.2)
 XVector                                      * 0.50.0    2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 yaml                                           2.3.12    2025-12-10 [1] CRAN (R 4.5.2)
 zoo                                            1.8-15    2025-12-15 [1] CRAN (R 4.5.2)

 [1] /tachyon/groups/gbioinfo/Appz/R/BioC/R-4.5-release-foss-2024.05_BioC-3.22-release-foss-2024.05
 [2] /tachyon/groups/gbioinfo/Appz/easybuild/software/R/4.5.2-foss-2024.05/lib64/R/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────