Skip to contents

This function will extract a sequence context of sequence.context.width bases around the center of the regions defined in x from sequence.reference.

Usage

extractSeqContext(x, sequence.context.width, sequence.reference)

Arguments

x

A GRanges object defining the regions of interest. Alternatively, a RangedSummarizedExperiment object from which regions can be extracted using rowRanges. The extracted sequences will correspond to the regions defined as resize(x, width = sequence.context.width, fix = "center".

sequence.context.width

A numeric scalar giving the width of the sequence context to be extracted from the reference (sequence.reference argument). This must be an odd number so that the sequence can be centered on the modified base. If sequence.context.width = 0 (the default), no sequence context will be extracted.

sequence.reference

A BSgenome object, or a character scalar giving the path to a fasta formatted file with reference sequences, or a DNAStringSet object. The sequence context (see sequence.context.width argument) will be extracted from these sequences.

Value

A DNAStringSet object of the same length as x with extracted sequence context. All elements are guaranteed to have identical length (if the sequence context extends to before the start or beyond the end of a reference sequence, it will be padded with 'N' bases.

See also

Author

Michael Stadler

Examples

# file with sequence in fasta format of length 6957060
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")

# define some regions at the end of the reference sequence
regions <- GenomicRanges::GRanges(
    "chr1", IRanges::IRanges(start = 6957060 - c(4, 2, 0),
    width = 1, names = c("a","b","c")))

# extract sequence context (note the padding with N's)
extractSeqContext(regions, 7, reffile)
#> DNAStringSet object of length 3:
#>     width seq                                               names               
#> [1]     7 AAAGGGG                                           a
#> [2]     7 AGGGGAN                                           b
#> [3]     7 GGGANNN                                           c