Given a set of sequences, calculate observed and expected k-mer
frequencies. Expected frequencies are based on a Markov model of order
MMorder.
Arguments
- seqs
Set of sequences, either a
charactervector or aDNAStringSet.- kmerLen
A
numericscalar giving the k-mer length.- MMorder
A
numericscalar giving the order of the Markov model used to calculate the expected frequencies.- pseudocount
A
numericscalar - will be added to the observed counts for each k-mer to avoid zero values.- zoops
A
logicalscalar. IfTRUE(the default), only one or zero occurences of a k-mer are considered per sequence.- strata
A
factoror anumericscalar defining the strata of sequences. A separate Markov model and expected k-mer frequencies are estimated for the set of sequences in each stratum (level in astratafactor). Ifstratais a scalar value, it will be interpreted as the number of strata to split the sequences into according to their CpG observed-over-expected counts usingkmeans(CpGoe, centers = strata).- p.adjust.method
A character scalar selecting the p value adjustment method (used in
p.adjust).- includeRevComp
A
logicalscalar. IfTRUE(default), count k-mer occurrences in bothseqsand their reverse-complement, by concatenatingseqsand their reverse-complemented versions before the counting. This is useful if motifs can be expected to occur on any strand (e.g. DNA sequences of ChIP-seq peaks). If motifs are only expected on the forward strand (e.g. RNA sequences of CLIP-seq peaks),includeRevComp = FALSEshould be used. Note that ifstratais a vector of the same length asseqs, each reverse-complemented sequence will be assigned to the same stratum as the forward sequence.
Value
A list with observed and expected k-mer frequencies
(freq.obs and freq.exp, respectively), and enrichment
statistics for each k-mer.
Examples
res <- getKmerFreq(seqs = c("AAAAATT", "AAATTTT"), kmerLen = 3)
names(res)
#> [1] "freq.obs" "freq.exp" "log2enr" "sqrtDelta" "z"
#> [6] "p" "padj" "strata" "freq.strata" "CpGoe"
head(res$freq.obs)
#> AAA AAC AAG AAT ACA ACC
#> 3 0 0 4 0 0
head(res$freq.exp)
#> AAA AAC AAG AAT ACA ACC
#> 1.8295455 0.1663223 0.1663223 1.4969008 0.3326446 0.3326446