R/motif_enrichment_HOMER.R
calcBinnedMotifEnrHomer.Rd
Run complete HOMER motif enrichment analysis, consisting of
calls to prepareHomer
, system2
and
parseHomerOutput
. This function requires HOMER
to be installed (see http://homer.ucsd.edu/homer/index.html)
and the path to the tool to be provided (homerfile
argument).
calcBinnedMotifEnrHomer(
gr,
b,
genomedir,
outdir,
motifFile,
homerfile = findHomer(),
regionsize = "given",
pseudocount.log2enr = 8,
p.adjust.method = "BH",
Ncpu = 2L,
verbose = FALSE,
verbose.Homer = FALSE
)
A GRanges
object (or an object that can be coerced to one)
with the genomic regions to analyze.
A vector of the same length as gr
that groups its elements
into bins (typically a factor, such as the one returned by
bin
).
Directory containing sequence files in Fasta format (one per chromosome).
A path specifying the folder into which the output files will be written.
A file with HOMER formatted PWMs to be used in the enrichment analysis.
Path and file name of the findMotifsGenome.pl
HOMER script.
The peak size to use in HOMER ("given"
keeps the
coordinate region, an integer value will keep only that many bases in
the region center).
A numerical scalar with the pseudocount to add to foreground and background counts when calculating log2 motif enrichments
A character scalar selecting the p value adjustment
method (used in p.adjust
).
Number of parallel threads that HOMER can use.
A logical scalar. If TRUE
, print progress messages.
A logical scalar. If TRUE
, print the console
output when running Homer.
A SummarizedExperiment
object with motifs in rows and bins
in columns, containing seven assays:
negLog10P: -log10 P values
negLog10Padj: -log10 adjusted P values
pearsonResid: motif enrichments as Pearson residuals
expForegroundWgtWithHits: expected number of foreground sequences with motif hits
log2enr: motif enrichments as log2 ratios
sumForegroundWgtWithHits: Sum of foreground sequence weights in a bin that have motif hits
sumBackgroundWgtWithHits: Sum of background sequence weights in a bin that have motif hits
The rowData
of the object contains annotations (name, PFMs, PWMs
and GC fraction) for the motifs, while the colData
slot contains
summary information about the bins.
The functions that are wrapped: prepareHomer
,
system2
and parseHomerOutput
,
bin
for binning of regions
if (!is.na(findHomer())){
# genome
genome <- system.file("extdata", "exampleGenome.fa", package = "monaLisa")
# create motif file for Homer
motiffile <- tempfile()
motifIDs <- c("MA0139.1", "MA1102.1", "MA0740.1")
dumpJaspar(filename = motiffile, pkg = "JASPAR2020",
opts = list(ID = motifIDs))
# GRanges of regions used in binned motif enrichment analysis
gr <- GenomicRanges::tileGenome(
seqlengths = c(chr1 = 10000L, chr2 = 10000L, chr3 = 10000L),
tilewidth = 200, cut.last.tile.in.chrom = TRUE)
# create bins (motif enrichment analysis will be per bin)
bins <- factor(GenomicRanges::seqnames(gr))
table(bins)
# run calcBinnedMotifEnrHomer
outdir <- tempfile()
se <- calcBinnedMotifEnrHomer(gr = gr, b = bins, genomedir = genome,
outdir = outdir, motifFile = motiffile)
list.files(outdir)
}