Extract the sequence context around positions of interest.
Source:R/seqContext.R
extractSeqContext.Rd
This function will extract a sequence context of sequence.context.width
bases around the center of the regions defined in x
from
sequence.reference
.
Arguments
- x
A
GRanges
object defining the regions of interest. Alternatively, aRangedSummarizedExperiment
object from which regions can be extracted usingrowRanges
. The extracted sequences will correspond to the regions defined asresize(x, width = sequence.context.width, fix = "center"
.- sequence.context.width
A numeric scalar giving the width of the sequence context to be extracted from the reference (
sequence.reference
argument). This must be an odd number so that the sequence can be centered on the modified base. Ifsequence.context.width = 0
(the default), no sequence context will be extracted.- sequence.reference
A
BSgenome
object, or a character scalar giving the path to a fasta formatted file with reference sequences, or aDNAStringSet
object. The sequence context (seesequence.context.width
argument) will be extracted from these sequences.
Value
A DNAStringSet
object of the same length
as x
with extracted sequence context. All elements are guaranteed
to have identical length (if the sequence context extends to before the
start or beyond the end of a reference sequence, it will be padded with
'N' bases.
Examples
# file with sequence in fasta format of length 6957060
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")
# define some regions at the end of the reference sequence
regions <- GenomicRanges::GRanges(
"chr1", IRanges::IRanges(start = 6957060 - c(4, 2, 0),
width = 1, names = c("a","b","c")))
# extract sequence context (note the padding with N's)
extractSeqContext(regions, 7, reffile)
#> DNAStringSet object of length 3:
#> width seq names
#> [1] 7 AAAGGGG a
#> [2] 7 AGGGGAN b
#> [3] 7 GGGANNN c