Extract the sequence context around positions of interest
Source:R/seqContext.R
extractSeqContext.Rd
This function will extract a sequence context of sequenceContextWidth
bases around the center of the regions defined in x
from
sequenceReference
.
Arguments
- x
A
GRanges
object defining the regions of interest. Alternatively, aRangedSummarizedExperiment
object from which regions can be extracted usingrowRanges
. The extracted sequences will correspond to the regions defined asresize(x, width = sequenceContextWidth, fix = "center"
.- sequenceContextWidth
A numeric scalar giving the width of the sequence context to be extracted from the reference (
sequenceReference
argument). This must be an odd number so that the sequence can be centered on the modified base. IfsequenceContextWidth = 0
(the default), no sequence context will be extracted.- sequenceReference
A
DNAStringSet
object, aBSgenome
object, or a character scalar giving the path to a fasta formatted file with reference sequences. If aBSgenome
object is provided, it will be internally converted to aDNAStringSet
object, since the latter allows for faster sequence retrieval.
Value
A DNAStringSet
object of the same length
as x
with extracted sequence context. All elements are guaranteed
to have identical length (if the sequence context extends to before the
start or beyond the end of a reference sequence, it will be padded with
'N' bases.
Examples
# file with sequence in fasta format of length 6957060
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")
# define some regions at the end of the reference sequence
regions <- GenomicRanges::GRanges(
"chr1", IRanges::IRanges(start = 6957060 - c(4, 2, 0),
width = 1, names = c("a","b","c")))
# extract sequence context (note the padding with N's)
extractSeqContext(regions, 7, reffile)
#> DNAStringSet object of length 3:
#> width seq names
#> [1] 7 AAAGGGG a
#> [2] 7 AGGGGAN b
#> [3] 7 GGGANNN c