Parameter values

Load required packages and helper functions

Read data

Preprocessing

Number of interactors in IP-MS for each TF

Facts and numbers - ChIP-seq

Facts and numbers - IP-MS

Figure 1

Number of TFs for each DBD family
WT RNA-seq TPM values for TFs
Number of interactors in IP-MS for each TF
Number of peaks for each TF
Sequence evidence for tagged transcription factors
Put together
Supplementary figure

Session info

Parameter values

params

## $rds150
## [1] "data/ipms_150_sce.rds"
## 
## $rds500
## [1] "data/ipms_500_sce.rds"
## 
## $rdsrnaseq
## [1] "data/rnaseq_del.rds"
## 
## $peakcsv
## [1] "data/fused_peaks_filtered.csv.gz"
## 
## $dbdtxt
## [1] "data/TF_DBD_annotation.txt"
## 
## $baitclass
## [1] "data/ipms_bait_class.txt"
## 
## $idmap
## [1] "data/id_mapping_table.txt"
## 
## $tagcntfile
## [1] "data/taggedTFs_counts.csv"

Load required packages and helper functions

suppressPackageStartupMessages({
    library(SingleCellExperiment)
    library(ggplot2)
    library(cowplot)
    library(einprot)
    library(dplyr)
    library(forcats)
    library(jsonlite)
    library(ComplexHeatmap)
    library(ggrepel)
    library(colorspace)
})

source("params/plot_settings.R")
source("params/get_testres_function.R")
source("params/ipms_params.R")
source("params/mapping_functions.R")

Read data

We load two SingleCellExperiment objects, containing data and results from the low- and high-salt IPMS experiments, respectively.

sce150 <- readRDS(params$rds150)
sce500 <- readRDS(params$rds500)

We extract the full list of baits, as well as load a classification in terms of the number of interaction partners.

## Feature ID mapping table
idmap <- read.delim(params$idmap)

## Experiment names
allexps <- unique(sub("(ID[0-9]+|SPB4482)_([^_]*)_?.*", "\\2", colnames(sce150)))
allexps <- allexps[allexps != "untagged"]

## Corresponding bait protein names
stopifnot(all(allexps %in% idmap$bait))
allbaits <- idmap$unique_einprot_id[match(allexps, idmap$bait)]

baitclass <- read.delim(params$baitclass)
stopifnot(all(.capitalize(allexps) %in% baitclass$Gene_name))
table(class150 = baitclass$class, 
      class500 = baitclass$class500, useNA = "ifany")

##            class500
## class150       Lost in 500 mM Retained in 500 mM <NA>
##   150and500  0             16                 24    0
##   150only    0              0                  0   43
##   nobait     1              0                  0    6

In the following set of IPs, we identified the bait as not pulled down in the low-salt condition.

(nobait_150 <- baitclass$Gene_name[baitclass$class == "nobait"])

## [1] "Cha4"        "Cuf2"        "Atf21"       "Cbf12"       "Sre2"       
## [6] "SPBC1348.12" "Untagged"

Next, we load a SummarizedExperiment with quantifications from RNA-seq data.

rnaseq <- readRDS(params$rdsrnaseq)
stopifnot(all(allbaits %in% rownames(rnaseq)))

We also load the information about the DNA-binding domain family for each transcription factor.

dbd <- read.delim(params$dbdtxt) |>
    dplyr::group_by(DBD_class_name, Conserved_human) |>
    dplyr::tally() |>
    dplyr::mutate(cons = ifelse(Conserved_human, "Conserved in human",
                                "Not conserved in human")) |>
    dplyr::group_by(DBD_class_name) |>
    dplyr::mutate(totn = sum(n)) |>
    dplyr::ungroup()

Here, we load the filtered peaks identified from ChIP-seq experiments, with the information about which baits showed enrichments for each peak.

peaks <- read.csv(params$peakcsv, row.names = 1)
enrcolumns <- grep("^is_enr_in[.]", colnames(peaks))
peakenr <- as.matrix(peaks[, enrcolumns])
colnames(peakenr) <- sub("^is_enr_in[.]", "", colnames(peakenr))
nbrPeaks <- colSums(peakenr)
nbrPeaksSpec <- colSums(peakenr[peaks$peaktype == "specific peaks", ])

Finally, we load the number of ChIP-seq fragments in input samples that did not map to the wild-type genome sequence, but to one of the tagged transcription factor constructs.

tagcnt <- as.matrix(read.csv(file = params$tagcntfile, row.names = 1))

Preprocessing

Number of interactors in IP-MS for each TF

We summarize the number of observed proteins pulled down for each TF in the IP-MS data (low and high salt conditions), excluding the bait itself. A protein is considered pulled down if the adjusted p-value in a comparison against the large complement group (see manuscript for details) is below 0.0002 and the log-fold change exceeds 1.

tc150 <- .getTestCols(sce150, adjp_cutoff = adjpThreshold,
                      logfc_cutoff = log2fcThreshold)
tc500 <- .getTestCols(sce500, adjp_cutoff = adjpThreshold,
                      logfc_cutoff = log2fcThreshold)

## Get matrix indicating whether the proteins pass the interaction threshold 
## in the experiments
interactors150 <- tc150$interactor
bait_idx150 <- cbind(match(.getProteinNameFromComparison(colnames(interactors150)), 
                           rownames(interactors150)), 
                     seq_len(ncol(interactors150)))
interactors150[bait_idx150] <- NA
colnames(interactors150) <- .getOrigBaitNameFromComparison(colnames(interactors150),
                                                           idmap = idmap)
interactors150 <- lapply(structure(colnames(interactors150), names = colnames(interactors150)), 
                         function(cn) {
                             rownames(interactors150)[which(interactors150[, cn])]
                         })

interactors500 <- tc500$interactor
bait_idx500 <- cbind(match(.getProteinNameFromComparison(colnames(interactors500)), 
                           rownames(interactors500)), 
                     seq_len(ncol(interactors500)))
interactors500[bait_idx500] <- NA
colnames(interactors500) <- .getOrigBaitNameFromComparison(colnames(interactors500),
                                                           idmap = idmap)
interactors500 <- lapply(structure(colnames(interactors500), names = colnames(interactors500)), 
                         function(cn) {
                             rownames(interactors500)[which(interactors500[, cn])]
                         })
## For this figure, we only care about the interactors that are _retained_ in 500 mM
## (not new ones that are detected e.g. due to the different complements)
for (nm in names(interactors500)) {
    interactors500[[nm]] <- intersect(interactors500[[nm]], interactors150[[nm]])
}
intdf <- dplyr::bind_rows(
    data.frame(exp = "150 mM NaCl IP-MS",
               id = names(interactors150),
               nbrInteractions = vapply(interactors150, length, NA_real_),
               inHighSalt = names(interactors150) %in% names(interactors500)),
    data.frame(exp = "Retained in 500 mM NaCl IP-MS",
               id = names(interactors500),
               nbrInteractions = vapply(interactors500, length, NA_real_),
               inHighSalt = TRUE)
) |>
    dplyr::left_join(baitclass, by = c("id" = "Gene_name")) |>
    dplyr::mutate(idorig = id) |>
    dplyr::mutate(id = .getProteinFromOrigBait(idorig,
                                               idmap = idmap)) |>
    dplyr::mutate(id = replace(id, idorig == "Untagged", "Untagged"))

Facts and numbers - ChIP-seq

## How many TFs did we analyze by ChIP-seq?
is_TF <- colnames(peakenr) != "Untagged"
sum(is_TF)

## [1] 80

## How many unique peak regions did we identify?
nrow(peaks)

## [1] 2027

## How many binding events (any TF) do we have in total?
sum(peakenr[, is_TF])

## [1] 9365

## How many TFs have at least one peak?
summary(colSums(peakenr)[is_TF] > 0)

##    Mode   FALSE    TRUE 
## logical       3      77

colnames(peakenr)[is_TF & colSums(peakenr) == 0]

## [1] "Mbx1" "Stb3" "Thi1"

## What is the range of numbers of peaks over TFs?
summary(colSums(peakenr)[is_TF])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   34.25  108.50  117.06  183.00  356.00

summary(colSums(peakenr)[is_TF & colSums(peakenr) > 0]) # without zero-peak TFs

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    38.0   113.0   121.6   183.0   356.0

## What fraction of TFs is bound at a given site?
## ... for all peaks
summary(rowMeans(peakenr[, is_TF]))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.01250 0.05775 0.02500 0.81250

## ... for peaks by type
table(peaks$peaktype)

## 
##   common peaks (frequent) common peaks (ubiquitous)            specific peaks 
##                        94                       247                      1686

summary(rowSums(peakenr[peaks$peaktype == "common peaks (ubiquitous)", is_TF]))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.0    12.0    26.0    27.4    42.0    65.0

summary(rowSums(peakenr[peaks$peaktype == "common peaks (frequent)", is_TF]))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00    5.00    8.00    9.17   12.00   26.00

summary(rowSums(peakenr[peaks$peaktype == "specific peaks", is_TF]))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   1.029   1.000   4.000

Facts and numbers - IP-MS

## Number of NCBI Conserved Domains shown in the figure (excluding "Other")
length(setdiff(unique(dbd$DBD_class_name), "Other"))

## [1] 14

## Percentage of domains conserved in human
round(100 * sum(dbd$n[dbd$Conserved_human]) / sum(dbd$n), 2)

## [1] 17.98

## Number of analyzed TFs
## ... 150 mM NaCl
length(setdiff(unique(sce150$Gene_name), "untagged"))

## [1] 89

## ... 500 mM NaCl
length(setdiff(unique(sce500$Gene_name), "untagged"))

## [1] 40

Figure 1

Number of TFs for each DBD family

The figure below shows the number of transcription factors belonging to each DNA-binding domain family, further stratified by whether or not they have a human ortholog.

(gg_dbd <- ggplot(dbd, aes(x = fct_reorder(DBD_class_name, totn, .desc = TRUE), y = n, 
                           fill = cons)) + 
     geom_col() + 
     scale_fill_manual(values = c("Conserved in human" = lighten(main_colors[1], 0.3),
                                  "Not conserved in human" = main_colors[1]), 
                       name = "") + 
     labs(x = "DBD family (NCBI Conserved Domains)",
          y = "Number of\ntranscription factors") + 
     theme_cowplot() + 
     theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
           legend.position = "inside",
           legend.position.inside = c(0.55, 0.9)))

WT RNA-seq TPM values for TFs

Next, we extract the average expression (in TPM - transcripts per million) for each transcription factor across three wild-type samples.

ave_tpm <- data.frame(
    bait = allbaits, 
    bait_pulled = factor(
        ifelse(.capitalize(allexps) %in% nobait_150, 
               "Bait not pulled down (150 mM NaCl IP-MS)",
               "Bait pulled down (150 mM NaCl IP-MS)"),
        levels = c("Bait pulled down (150 mM NaCl IP-MS)",
                   "Bait not pulled down (150 mM NaCl IP-MS)", 
                   "Meiosis-specific transcription factors")),
    tpm = rowMeans(assay(rnaseq, "abundance")[allbaits, 
                                              rnaseq$Group_name == "wt_WT_mRNA"])
)
gg_tpm <- ggplot(ave_tpm, aes(x = fct_reorder(bait, tpm, .desc = TRUE), y = tpm, 
                              color = bait_pulled, shape = bait_pulled)) + 
    geom_hline(yintercept = 0, linetype = "dashed", color = na_color) +
    geom_point(size = 1, show.legend = TRUE) + 
    theme_cowplot() + 
    theme(legend.position = "inside", legend.position.inside = c(0.15, 0.9)) +
    scale_color_manual(values = c("Bait pulled down (150 mM NaCl IP-MS)" = pt_color,
                                  "Bait not pulled down (150 mM NaCl IP-MS)" = na_color,
                                  "Meiosis-specific transcription factors" = main_colors[1]),
                       drop = FALSE, name = "") + 
    scale_shape_manual(values = c("Bait pulled down (150 mM NaCl IP-MS)" = 16,
                                  "Bait not pulled down (150 mM NaCl IP-MS)" = 16,
                                  "Meiosis-specific transcription factors" = 15),
                       drop = FALSE, name = "") + 
    guides(color = guide_legend(nrow = 3, byrow = TRUE,
                                override.aes = list(size = 3))) + 
    labs(x = "Transcription factors",
         y = "Transcription factor\ngene expression (TPM)")
(gg_tpm + 
        geom_segment(data = ave_tpm |> filter(bait %in% c("mei4", "atf31", "atf21", "cuf2", "rsv1")),
                     aes(x = bait, xend = bait, y = -6, yend = -20), color = main_colors[1], 
                     linewidth = 1.25) + 
        scale_y_continuous(expand = expansion(mult = c(0, 0.025))) + 
        theme(axis.text.x = element_blank()))

Number of interactors in IP-MS for each TF

Here, we plot the number of interactors for each TF in the 150 mM NaCl condition, as well as the number of interactors retained in the 500 mM NaCl condition.

idOrder <- intdf |>
    dplyr::filter(exp == "150 mM NaCl IP-MS") |>
    dplyr::arrange(desc(nbrInteractions)) |>
    dplyr::pull(id)
(gg_nbrint <- ggplot(intdf |>
                         dplyr::filter(!id %in% nobait_150 & 
                                           id != "Untagged") |>
                         dplyr::filter(inHighSalt),
                     aes(x = fct_relevel(id, idOrder),
                         y = nbrInteractions)) +
        geom_hline(yintercept = 0, linetype = "dashed", color = na_color) + 
        geom_line(color = na_color) + 
        geom_point(shape = 21, aes(fill = exp, color = exp, size = exp)) +
        scale_fill_manual(values = c(`150 mM NaCl IP-MS` = main_colors[5], 
                                     `Retained in 500 mM NaCl IP-MS` = pt_color), 
                          name = "") + 
        scale_color_manual(values = c(`150 mM NaCl IP-MS` = main_colors[5], 
                                      `Retained in 500 mM NaCl IP-MS` = "transparent"),
                           name = "") + 
        scale_size_manual(values = c(`150 mM NaCl IP-MS` = 3, 
                                     `Retained in 500 mM NaCl IP-MS` = 2),
                          name = "") + 
        theme_cowplot() + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1,
                                         size = 9, vjust = 0.5), 
              legend.position = c(0.35, 0.9)) + 
        labs(x = "", y = "Number of interactors") + 
        coord_cartesian(ylim = c(0, NA)))

(gg_nbrint_all <- ggplot(
    intdf |>
        dplyr::filter(exp == "150 mM NaCl IP-MS" & 
                          id != "Untagged"),
    aes(x = fct_relevel(id, levels(fct_reorder(.capitalize(ave_tpm$bait),
                                               ave_tpm$tpm, .desc = TRUE))), 
        y = nbrInteractions)) + 
     geom_hline(yintercept = 0, linetype = "dashed", colour = na_color) +
     geom_line() + 
     geom_point(shape = 21, size = 3, colour = main_colors[5],
                fill = main_colors[5]) +
     theme_cowplot() + 
     theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
           legend.position = "bottom") + 
     labs(x = "", y = "Number of interactors") + 
     coord_cartesian(ylim = c(0, NA)))

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

Number of peaks for each TF

## Number of peaks per TF (unfiltered)
pd0 <- intdf |>
    dplyr::filter(exp == "150 mM NaCl IP-MS" & id != "Untagged") |>
    dplyr::mutate(nbrPeaks = nbrPeaks[id],
                  nbrPeaksSpec = nbrPeaksSpec[id]) |>
    dplyr::mutate(chipped = !is.na(nbrPeaks))

## Only chipped TFs, in long format
pd <- pd0 |>
    dplyr::filter(chipped) |>
    dplyr::mutate(id = fct_reorder(factor(id), .x = nbrPeaks,
                                   .fun = function(x) x, .desc = TRUE)) |>
    tidyr::pivot_longer(cols = c("nbrPeaks", "nbrPeaksSpec"),
                        names_to = "peakType",
                        values_to = "peakNumber") |>
    dplyr::mutate(peakType = fct_relabel(
        peakType,
        function(x) c(nbrPeaks = "All ChIP peaks",
                      nbrPeaksSpec = "Specific ChIP peaks")[x]))

(gg_nbrpeaks_all_narrow_by_npeaks <- ggplot(pd |> dplyr::filter(chipped),
    aes(x = id, y = peakNumber)) +
    geom_hline(yintercept = 0, linetype = "dashed", color = na_color) +
    geom_line(color = na_color) +
    geom_point(mapping = aes(color = peakType, fill = peakType, size = peakType), shape = 21) +
    scale_fill_manual(values = c(`All ChIP peaks` = main_colors[5],
                                 `Specific ChIP peaks` = pt_color), name = "") +
    scale_color_manual(values = c(`All ChIP peaks` = main_colors[5],
                                  `Specific ChIP peaks` = pt_color), name = "") +
    scale_size_manual(values = c(`All ChIP peaks` = 1.75,
                                 `Specific ChIP peaks` = 1),
                      name = "") +
    theme_cowplot() + 
    theme(axis.text.x = element_blank(), 
          axis.ticks.x = element_blank(), 
          legend.position = c(0.55, 0.9)) + 
    labs(x = "Transcription factors",
         y = "Number of enriched peaks\n(IP/input > 2)") + 
    coord_cartesian(ylim = c(0, NA)))

(gg_nbrpeaks_all_wide <- ggplot(pd0,
    aes(x = fct_relevel(id, levels(fct_reorder(.capitalize(ave_tpm$bait), ave_tpm$tpm, .desc = TRUE))))) + 
    geom_hline(yintercept = 0, linetype = "dashed", color = na_color) +
    geom_segment(
        mapping = aes(y = nbrPeaks, yend = nbrPeaksSpec,
                      xend = fct_relevel(id, levels(fct_reorder(.capitalize(ave_tpm$bait), ave_tpm$tpm, .desc = TRUE)))),
        color = na_color) +
    geom_point(mapping = aes(y = nbrPeaks),
               shape = 21, size = 3.5, fill = main_colors[5], color = main_colors[5]) +
    geom_point(mapping = aes(y = nbrPeaksSpec),
               shape = 21, size = 2.5, fill = pt_color) +
    geom_tile(data = pd0 |> dplyr::filter(!chipped),
              aes(x = fct_relevel(id, levels(fct_reorder(.capitalize(ave_tpm$bait), ave_tpm$tpm, .desc = TRUE))),
                  y = I(0)),
              width = 0.75, height = Inf, fill = na_color) +
    geom_point(aes(x = 3, y = 400), size = 3.5, fill = main_colors[5], color = main_colors[5], shape = 21) +
    geom_point(aes(x = 3, y = 375), size = 2.5, fill = pt_color, shape = 21) +
    geom_tile(aes(x = 3, y = 350), width = 2, height = 15, fill = na_color) +
    annotate("text", x = 5, y = c(400, 375, 350), hjust = 0, vjust = 0.5,
             label = c("All ChIP peaks", "Specific ChIP peaks", "No ChIP data"),
             size = 4.2, fontface = "plain") +
    theme_cowplot() + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + 
    labs(x = "", y = "Number of peaks (IP/input > 2)") + 
    coord_cartesian(ylim = c(0, NA)))

Sequence evidence for tagged transcription factors

In order to validate the genomic sequences of tagged transcription factors in the engineered strains, we used all reads from ChIP input samples that did not map to the S. pombe reference (wildtype) genome.

For each sample, the unmapped reads were aligned against all tagged transcription factor sequences (see file reference/TFome_pombe_strain_lib.fasta). The resulting alignment counts are summarized in the tagcnt matrix of 80 rows by 81 columns. The rows correspond to tagged transcription factor sequences, and the columns to S. pombe strains in which a transcription factor has been tagged. The values correspond to the number of read pairs from ChIP input samples of each strain, that mapped unambigously to one of the expected tagged transcription factor constructs.

Tagged transcription factor sequences and S. pombe strains are ordered in the same way, expect for an additional column on the right side corresponding to the untagged parent strain (untagged).

To account for differences in sequencing depth, the counts are normalized by dividing through the total number of read pairs that mapped to each tagged transcription factor sequence, and then visualized as a heatmap.

# define heatmap dimensions
w_taggedTFs <- 130  # total heatmap width (mm)
h_taggedTFs <- 130  # total heatmap height (mm)
fs <- 5  # small font size
fl <- 10  # large font size

# normalize (percent of reads)
perc <- sweep(tagcnt, 1, rowSums(tagcnt), "/") * 100

rng <- range(perc)
hm5 <- Heatmap(matrix = perc, name = "Percent of reads mapped to\ntagged transcription factor ",
               cluster_rows = FALSE, cluster_columns = FALSE,
               layer_fun = function(j, i, x, y, width, height, fill) {
                   grid.rect(x = x, y = y, width = width, height = height, 
                             gp = gpar(col = "grey95", fill = NA))
               },
               column_title = "Strains",
               row_title = "Tagged transcription factor",
               row_title_gp = gpar(fontsize = fl),
               row_names_gp = gpar(fontsize = fs),
               column_title_gp = gpar(fontsize = fl),
               column_names_gp = gpar(fontsize = fs),
               col = circlize::colorRamp2(breaks = seq(rng[1], rng[2], length.out = 64),
                                          colors = rev(hcl.colors(64, "Purples"))),
               heatmap_legend_param = list(labels_gp = gpar(fontsize = fs),
                                           border = "gray10",
                                           legend_direction = "horizontal",
                                           legend_width = unit(20, "mm"),
                                           title_position = "lefttop",
                                           title_gp = gpar(fontsize = fl)),
               heatmap_width = unit(w_taggedTFs, "mm"), # total heatmap width
               heatmap_height = unit(h_taggedTFs, "mm"), # total heatmap height
               use_raster = TRUE)

hm_taggedTFs <- grid.grabExpr(
    hm5 <- draw(hm5, heatmap_legend_side = "bottom"),
    width = w_taggedTFs, height = h_taggedTFs
)

plot_grid(hm_taggedTFs)

Put together

fig1c <- ggdraw() + 
    draw_image("schematics/Fig1A_placeholder_screen_overview_Arial.png")

cowplot::plot_grid(
    fig1c,
    cowplot::plot_grid(
        gg_dbd, 
        gg_tpm + geom_segment(data = ave_tpm |> filter(bait %in% c("mei4", "atf31", "atf21", "cuf2", "rsv1")),
                              aes(x = bait, xend = bait, y = -6, yend = -20), color = main_colors[1], 
                              linewidth = 1.25) + 
            scale_y_continuous(expand = expansion(mult = c(0, 0.025))) + 
            theme(axis.text.x = element_blank(),
                  axis.ticks.length.x = unit(0, "mm")),
        nrow = 1,
        labels = c("B", "C"),
        align = "vh",
        axis = "b"
    ),
    cowplot::plot_grid(
        gg_nbrint,
        gg_nbrpeaks_all_narrow_by_npeaks,
        nrow = 1, 
        labels = c("D", "E"),
        align = "vh", 
        axis = "b"
    ),
    nrow = 3,
    labels = c("A", NA, NA),
    align = "v",
    axis = "b",
    rel_heights = c(73, 50, 50)
)

Supplementary figure

cowplot::plot_grid(
    gg_tpm + geom_segment(data = ave_tpm |> filter(bait %in% c("mei4", "atf31", "atf21", "cuf2", "rsv1")),
                          aes(x = bait, xend = bait, y = -9, yend = -30), color = main_colors[1], 
                          linewidth = 3) + 
        scale_y_continuous(expand = expansion(mult = c(0, 0.025))) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + 
        labs(x = "") + geom_point(size = 3),
    gg_nbrint_all,
    gg_nbrpeaks_all_wide,
    ncol = 1, 
    align = "v", 
    axis = "l",
    labels = c("A", "B", "C"),
    rel_heights = c(1, 1, 1)
)

Session info

Session info

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.10.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Zurich
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] colorspace_2.1-0            ggrepel_0.9.5              
##  [3] ComplexHeatmap_2.18.0       jsonlite_1.8.8             
##  [5] forcats_1.0.0               dplyr_1.1.4                
##  [7] einprot_0.9.4               cowplot_1.1.3              
##  [9] ggplot2_3.5.0               SingleCellExperiment_1.24.0
## [11] SummarizedExperiment_1.32.0 Biobase_2.62.0             
## [13] GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
## [15] IRanges_2.36.0              S4Vectors_0.40.2           
## [17] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
## [19] matrixStats_1.3.0          
## 
## loaded via a namespace (and not attached):
##   [1] STRINGdb_2.14.3             ProtGenerics_1.34.0        
##   [3] bitops_1.0-7                DirichletMultinomial_1.44.0
##   [5] TFBSTools_1.40.0            httr_1.4.7                 
##   [7] ash_1.0-15                  RColorBrewer_1.1-3         
##   [9] doParallel_1.0.17           tools_4.3.2                
##  [11] utf8_1.2.4                  R6_2.5.1                   
##  [13] DT_0.33                     lazyeval_0.2.2             
##  [15] mgcv_1.9-1                  GetoptLong_1.0.5           
##  [17] withr_3.0.0                 iSEEhex_1.4.0              
##  [19] gridExtra_2.3               GGally_2.2.1               
##  [21] ggalt_0.4.0                 cli_3.6.2                  
##  [23] Cairo_1.6-2                 shinyjs_2.1.0              
##  [25] sandwich_3.1-0              labeling_0.4.3             
##  [27] sass_0.4.9                  robustbase_0.99-2          
##  [29] mvtnorm_1.2-4               readr_2.1.5                
##  [31] genefilter_1.84.0           Rsamtools_2.18.0           
##  [33] systemfonts_1.0.6           R.utils_2.12.3             
##  [35] svglite_2.1.3               stringdist_0.9.12          
##  [37] scater_1.30.1               rrcov_1.7-5                
##  [39] plotrix_3.8-4               BSgenome_1.70.2            
##  [41] maps_3.4.2                  limma_3.58.1               
##  [43] rstudioapi_0.16.0           impute_1.76.0              
##  [45] RSQLite_2.3.6               BiocIO_1.12.0              
##  [47] generics_0.1.3              shape_1.4.6.1              
##  [49] gtools_3.9.5                GO.db_3.18.0               
##  [51] Matrix_1.6-5                ggbeeswarm_0.7.2           
##  [53] fansi_1.0.6                 imputeLCMD_2.1             
##  [55] abind_1.4-5                 R.methodsS3_1.8.2          
##  [57] lifecycle_1.0.4             yaml_2.3.8                 
##  [59] iSEEu_1.14.0                gplots_3.1.3.1             
##  [61] SparseArray_1.2.4           blob_1.2.4                 
##  [63] promises_1.3.0              crayon_1.5.2               
##  [65] shinydashboard_0.7.2        miniUI_0.1.1.1             
##  [67] lattice_0.22-6              msigdbr_7.5.1              
##  [69] beachmat_2.18.1             ComplexUpset_1.3.3         
##  [71] annotate_1.80.0             KEGGREST_1.42.0            
##  [73] magick_2.8.3                pillar_1.9.0               
##  [75] knitr_1.46                  rjson_0.2.21               
##  [77] codetools_0.2-19            glue_1.7.0                 
##  [79] ggiraph_0.8.9               data.table_1.15.4          
##  [81] pcaMethods_1.94.0           MultiAssayExperiment_1.28.0
##  [83] vctrs_0.6.5                 png_0.1-8                  
##  [85] poweRlaw_0.80.0             gtable_0.3.5               
##  [87] gsubfn_0.7                  cachem_1.0.8               
##  [89] xfun_0.43                   S4Arrays_1.2.1             
##  [91] mime_0.12                   pracma_2.4.4               
##  [93] pcaPP_2.0-4                 survival_3.5-8             
##  [95] rrcovNA_0.5-1               iterators_1.0.14           
##  [97] gmm_1.8                     iSEE_2.14.0                
##  [99] statmod_1.5.0               nlme_3.1-164               
## [101] bit64_4.0.5                 ExploreModelMatrix_1.14.0  
## [103] bslib_0.7.0                 tmvtnorm_1.6               
## [105] irlba_2.3.5.1               vipor_0.4.7                
## [107] KernSmooth_2.23-22          seqLogo_1.68.0             
## [109] DBI_1.2.2                   ade4_1.7-22                
## [111] proDA_1.16.0                motifStack_1.46.0          
## [113] tidyselect_1.2.1            bit_4.0.5                  
## [115] compiler_4.3.2              extrafontdb_1.0            
## [117] chron_2.3-61                BiocNeighbors_1.20.2       
## [119] xml2_1.3.6                  plotly_4.10.4              
## [121] DelayedArray_0.28.0         colourpicker_1.3.0         
## [123] rtracklayer_1.62.0          scales_1.3.0               
## [125] caTools_1.18.2              DEoptimR_1.1-3             
## [127] proj4_1.0-14                hexbin_1.28.3              
## [129] stringr_1.5.1               digest_0.6.35              
## [131] rmarkdown_2.26              XVector_0.42.0             
## [133] htmltools_0.5.8.1           pkgconfig_2.0.3            
## [135] extrafont_0.19              sparseMatrixStats_1.14.0   
## [137] highr_0.10                  fastmap_1.1.1              
## [139] rlang_1.1.3                 GlobalOptions_0.1.2        
## [141] htmlwidgets_1.6.4           DelayedMatrixStats_1.24.0  
## [143] shiny_1.8.1.1               farver_2.1.1               
## [145] jquerylib_0.1.4             zoo_1.8-12                 
## [147] BiocParallel_1.36.0         mclust_6.1                 
## [149] R.oo_1.26.0                 BiocSingular_1.18.0        
## [151] RCurl_1.98-1.14             magrittr_2.0.3             
## [153] scuttle_1.12.0              kableExtra_1.4.0           
## [155] GenomeInfoDbData_1.2.11     patchwork_1.2.0            
## [157] munsell_0.5.1               Rcpp_1.0.12                
## [159] viridis_0.6.5               babelgene_22.9             
## [161] proto_1.0.0                 MsCoreUtils_1.14.1         
## [163] sqldf_0.4-11                stringi_1.8.3              
## [165] rintrojs_0.3.4              zlibbioc_1.48.2            
## [167] MASS_7.3-60                 plyr_1.8.9                 
## [169] ggstats_0.6.0               parallel_4.3.2             
## [171] CNEr_1.38.0                 Biostrings_2.70.3          
## [173] splines_4.3.2               hash_2.2.6.3               
## [175] hms_1.1.3                   circlize_0.4.16            
## [177] igraph_2.0.3                uuid_1.2-0                 
## [179] QFeatures_1.12.0            reshape2_1.4.4             
## [181] ScaledMatrix_1.10.0         TFMPvalue_0.0.9            
## [183] XML_3.99-0.16.1             evaluate_0.23              
## [185] tzdb_0.4.0                  foreach_1.5.2              
## [187] httpuv_1.6.15               Rttf2pt1_1.3.12            
## [189] tidyr_1.3.1                 purrr_1.0.2                
## [191] clue_0.3-65                 norm_1.0-11.1              
## [193] rsvd_1.0.5                  xtable_1.8-4               
## [195] restfulr_0.0.15             AnnotationFilter_1.26.0    
## [197] later_1.3.2                 viridisLite_0.4.2          
## [199] tibble_3.2.1                beeswarm_0.4.0             
## [201] memoise_2.0.1               AnnotationDbi_1.64.1       
## [203] GenomicAlignments_1.38.2    writexl_1.5.0              
## [205] cluster_2.1.4               shinyWidgets_0.8.5         
## [207] shinyAce_0.4.2

Figure 1 - overview

Charlotte Soneson, Michael Stadler, Merle Skribbe

2024-12-12

Parameter values

Load required packages and helper functions

Read data

Preprocessing

Number of interactors in IP-MS for each TF

Facts and numbers - ChIP-seq

Facts and numbers - IP-MS

Figure 1

Number of TFs for each DBD family

WT RNA-seq TPM values for TFs

Number of interactors in IP-MS for each TF

Number of peaks for each TF

Sequence evidence for tagged transcription factors

Put together

Supplementary figure

Session info