Skip to contents

Plot various diagnostics of binned sequences. Three plot types are available:

length

plots the distribution of sequence lengths within each bin.

GCfrac

plots the distribution of GC fractions within each bin.

dinucfreq

plots a heatmap of the relative frequency of each dinucleotide, averaged across the sequences within each bin. The values are centered for each dinucleotide to better highlight differences between the bins. The average relative frequency of each dinucleotide (across the bins) is indicated as well.

Usage

plotBinDiagnostics(
  seqs,
  bins,
  aspect = c("length", "GCfrac", "dinucfreq"),
  ...
)

Arguments

seqs

DNAStringSet object with sequences.

bins

factor of the same length and order as seqs, indicating the bin for each sequence. Typically the return value of bin.

aspect

The diagnostic to plot. Should be one of "length", "GCfrac" and "dinucfreq", to plot the distribution of sequence lengths, the distribution of GC fractions and the average relative dinucleotide frequencies across the bins.

...

Additional argument passed to getColsByBin.

Value

For aspect="length" or "GCfrac", returns (invisibly) the output of vioplot(), which generates the plot. For aspect="dinucfreq", returns (invisibly) the ComplexHeatmap object.

Examples

seqs <- Biostrings::DNAStringSet(
  vapply(1:100, function(i) paste(sample(c("A", "C", "G", "T"), 10,
                                         replace = TRUE), collapse = ""), "")
)
bins <- factor(rep(1:2, each = 50))
plotBinDiagnostics(seqs, bins, aspect = "GCfrac")

plotBinDiagnostics(seqs, bins, aspect = "dinucfreq")