Extract the sequence context around positions of interest
Source:R/seqContext.R
extractSeqContext.RdThis function will extract a sequence context of sequenceContextWidth
bases around the center of the regions defined in x from
sequenceReference.
Arguments
- x
A
GRangesobject defining the regions of interest. Alternatively, aRangedSummarizedExperimentobject from which regions can be extracted usingrowRanges. The extracted sequences will correspond to the regions defined asresize(x, width = sequenceContextWidth, fix = "center".- sequenceContextWidth
A numeric scalar giving the width of the sequence context to be extracted from the reference (
sequenceReferenceargument). This must be an odd number so that the sequence can be centered on the modified base. IfsequenceContextWidth = 0(the default), no sequence context will be extracted.- sequenceReference
A
DNAStringSetobject, aBSgenomeobject, or a character scalar giving the path to a fasta formatted file with reference sequences. If aBSgenomeobject is provided, it will be internally converted to aDNAStringSetobject, since the latter allows for faster sequence retrieval.
Value
A DNAStringSet object of the same length
as x with extracted sequence context. All elements are guaranteed
to have identical length (if the sequence context extends to before the
start or beyond the end of a reference sequence, it will be padded with
'N' bases.
Examples
# file with sequence in fasta format of length 6957060
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")
# define some regions at the end of the reference sequence
regions <- GenomicRanges::GRanges(
"chr1", IRanges::IRanges(start = 6957060 - c(4, 2, 0),
width = 1, names = c("a", "b", "c")))
# extract sequence context (note the padding with N's)
extractSeqContext(regions, 7, reffile)
#> DNAStringSet object of length 3:
#> width seq names
#> [1] 7 AAAGGGG a
#> [2] 7 AGGGGAN b
#> [3] 7 GGGANNN c