Add sequence context around positions of interest to a SummarizedExperiment
Source:R/seqContext.R
addSeqContext.Rd
Convenience function to extract sequence context around positions of
interest (the rowRanges
of
a RangedSummarizedExperiment
) and
add them the the SummarizedExperiment
's
row data (rowData(se)$sequenceContext
). The extracted sequences
will correspond to the regions defined as
resize(rowRanges(x), width = sequenceContextWidth, fix = "center"
.
Sequence contexts are extracted using extractSeqContext
.
Arguments
- x
- sequenceContextWidth
A numeric scalar giving the width of the sequence context to be extracted from the reference (
sequenceReference
argument). This must be an odd number so that the sequence can be centered on the modified base. IfsequenceContextWidth = 0
(the default), no sequence context will be extracted.- sequenceReference
A
BSgenome
object, or a character scalar giving the path to a fasta formatted file with reference sequences, or aDNAStringSet
object. The sequence context (seesequenceContextWidth
argument) will be extracted from these sequences.
Value
A RangedSummarizedExperiment
object with sequence contexts added as a
DNAStringSet
object to
rowData(x)$sequenceContext
.
Examples
# load package
library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
# file with sequence in fasta format of length 6957060
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")
# define some regions at the end of the reference sequence
se <- SummarizedExperiment(
assays = matrix(1:3, ncol=1),
rowRanges = GRanges(
"chr1", IRanges(start = 6957060 - c(4, 2, 0),
width = 1, names = c("a","b","c")),
strand = "-"))
# add sequence context (note the padding with N's)
rowRanges(se)
#> GRanges object with 3 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> a chr1 6957056 -
#> b chr1 6957058 -
#> c chr1 6957060 -
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
se <- addSeqContext(se, 7, reffile)
rowRanges(se)
#> GRanges object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | sequenceContext
#> <Rle> <IRanges> <Rle> | <DNAStringSet>
#> a chr1 6957056 - | CCCCTTT
#> b chr1 6957058 - | TCCCCTN
#> c chr1 6957060 - | TCCCNNN
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths