Quantify alignments from sequencing data, relative to their position in query regions.
qProfile(
proj,
query,
upstream = 1000,
downstream = upstream,
selectReadPosition = c("start", "end"),
shift = 0L,
orientation = c("any", "same", "opposite"),
useRead = c("any", "first", "last"),
auxiliaryName = NULL,
mask = NULL,
collapseBySample = TRUE,
includeSpliced = TRUE,
includeSecondary = TRUE,
mapqMin = 0L,
mapqMax = 255L,
absIsizeMin = NULL,
absIsizeMax = NULL,
maxInsertSize = 500L,
binSize = 1L,
clObj = NULL
)
A qProject
object representing a
sequencing experiment as returned by qAlign
An object of type GRanges
with the regions to be profiled. All regions in query
will be
anchored at their biological start position (start(query)
for
regions on strand “+” or “*”, end(query)
for
regions on strand “-”). This position will become position zero
in the return value.
An “integer” vector of length one or the same
length as query
indicating the number of bases upstream of the
anchor position to include in the profile.
An “integer” vector of length one or the same
length as query
indicating the number of bases downstream of the
anchor position to include in the profile.
defines the part of the alignment that has to be contained within a query region to produce an overlap (see Details), and that is used to calculate the relative position within the query region. Possible values are:
start
(default): start of the alignment
end
: end of the alignment
controls the shifting alignments towards their 3'-end before
quantification. shift
can be one of:
an “integer” vector of the same length as the number of alignment files
a single “integer” value
the character string "halfInsert"
(only available for
paired-end experiments)
The default of 0
will not shift any alignments.
sets the required orientation of the alignments relative to the query region in order to be counted, one of:
any
(default): count alignment on the same and opposite strand
same
: count only alignment on the same strand
opposite
: count only alignment on the opposite strand
For paired-end experiments, selects the read mate whose alignments should be counted, one of:
any
(default): count all alignments
first
: count only alignments from the first read
last
: count only alignments from the last read
Which bam files to use in an experiments with auxiliary alignments (see Details).
If not NULL
, a GRanges
object with reference regions to be masked, i.e. excluded from the
quantification, such as unmappable or highly repetitive regions (see
Details).
If TRUE
(the default), sum alignment
counts from bam files with the same sample name.
If TRUE
(the default), include spliced
alignments when counting. A spliced alignment is defined as an
alignment with a gap in the read of at least 60 bases.
If TRUE
(the default), include alignments
with the secondary bit (0x0100) set in the FLAG
when counting.
Minimal mapping quality of alignments to be included when
counting (mapping quality must be greater than or equal to mapqMin
).
Valid values are between 0 and 255. The default (0) will include all
alignments.
Maximal mapping quality of alignments to be included when
counting (mapping quality must be less than or equal to mapqMax
).
Valid values are between 0 and 255. The default (255) will include all
alignments.
For paired-end experiments, minimal absolute insert
size (TLEN field in SAM Spec v1.4) of alignments to be included when
counting. Valid values are greater than 0 or NULL
(default),
which will not apply any minimum insert size filtering.
For paired-end experiments, maximal absolute insert
size (TLEN field in SAM Spec v1.4) of alignments to be included when
counting. Valid values are greater than 0 or NULL
(default),
which will not apply any maximum insert size filtering.
Maximal fragment size of the paired-end experiment.
This parameter is used if shift="halfInsert"
and will ensure that
query regions are made wide enough to emcompass all alignment pairs whose
mid falls into the query region. The default value is 500
bases.
Numeric scalar giving the size of bins (must be an odd number).
The default value (1
) gives back counts for single bases. Otherwise,
alignments are counted in adjacent, non-overlapping windows of size
binSize
that tile the interval defined by upstream
and
downstream
.
A cluster object to be used for parallel processing (see ‘Details’).
A list
of matrices with length(unique(names(query)))
rows
with profile names, and max(upstream)+max(downstream)+1
columns
indicating relative position (for binsize=1
).
For binSize
values greater than 1, the number of columns corresponds to
the number of bins (tiles), namely
ceiling(max(upstream)/binSize)+ceiling(max(downstream)/binSize)
.
A middle bin of size binSize
is always positioned centered at the anchor
of each region. Additional bins are positioned upstream and downstream, adjacent
to that middle bin, in order to include at least upstream
and
downstream
bases, respectively (potentially more in order to fill the
first and last bins).
The relative positions are given as column names (for binSize > 1
they refer to the bin mid). In that case, the bins are "right-open". For
example, if binSize = 10
, the bin with the midpoint "-50" contains
counts for the alignments in [-55,-45).
The first list element is called “coverage” and contains, for each profile and relative position, the number of overlapping regions that contributed to the profile.
Subsequent list elements contain the alignment counts for individual
sequence files (collapseBySample=FALSE
) or samples
(collapseBySample=TRUE
) in proj
.
For projects with allele-specific quantification, i.e. if a file with
single nucleotide polymorphisms was supplied to the snpFile
argument of qAlign
, there will be three rows
instead of one row with counts per unique region name, with numbers
of alignments for Reference, Unknown and Alternative genotypes
(suffixed _R, _U and _A).
qProfile
is used to count alignments in each sample from a
qProject
object, relative to their position in query regions.
Most arguments are identical to the ones of qCount
.
The query
argument is a GRanges
object that defines the regions for the profile. All regions in
query
will be aligned to one another at their anchor position,
which corresponds to their biological start position (start(query)
for regions on strand “+” or “*”, end(query)
for
regions on strand “-”).
This anchor position will be extended (with regard to strand) by
the number of bases specified by upstream
and downstream
.
In the return value, the anchor position will be at position zero.
If binSize
is greater than one, upstream
and downstream
will be slightly increased in order to include the complete first and last
bins of binSize
bases.
Regions with identical names in names{query}
will be summed, and
profiles will be padded with zeros to accomodate the length of all profiles.
qCount
,
qAlign
,
qProject
,
makeCluster
from package parallel
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
#> [1] TRUE
# create alignments (single-end experiment)
genomeFile <- "extdata/hg19sub.fa"
sampleFile <- "extdata/samples_chip_single.txt"
proj <- qAlign(sampleFile, genomeFile)
#> all necessary alignment files found
# load transcript start site coordinates
library(rtracklayer)
annotationFile <- "extdata/hg19sub_annotation.gtf"
tssRegions <- import.gff(annotationFile, format="gtf",
feature.type="start_codon")
# obtain a combined TSS profile
pr1 <- qProfile(proj, tssRegions)
#> profiling alignments...
#> done
lapply(pr1, dim)
#> $coverage
#> [1] 1 2001
#>
#> $Sample1
#> [1] 1 2001
#>
#> $Sample2
#> [1] 1 2001
#>
lapply(pr1, "[", , 1:5)
#> $coverage
#> -1000 -999 -998 -997 -996
#> 19 19 19 19 19
#>
#> $Sample1
#> -1000 -999 -998 -997 -996
#> 0 0 1 0 0
#>
#> $Sample2
#> -1000 -999 -998 -997 -996
#> 1 1 1 2 0
#>
prComb <- do.call("+", lapply(pr1[-1], function(x) x/pr1[[1]]))
barplot(prComb, xlab="Position", ylab="Mean no. of alignments")
# obtain TSS profiles for individual regions
names(tssRegions) <- mcols(tssRegions)$transcript_id
pr2 <- qProfile(proj, tssRegions)
#> profiling alignments...
#> done
lapply(pr2, dim)
#> $coverage
#> [1] 19 2001
#>
#> $Sample1
#> [1] 19 2001
#>
#> $Sample2
#> [1] 19 2001
#>
lapply(pr2, "[", 1:3, 1:5)
#> $coverage
#> -1000 -999 -998 -997 -996
#> ENST00000328596 1 1 1 1 1
#> ENST00000379265 1 1 1 1 1
#> ENST00000379268 1 1 1 1 1
#>
#> $Sample1
#> -1000 -999 -998 -997 -996
#> ENST00000328596 0 0 0 0 0
#> ENST00000379265 0 0 0 0 0
#> ENST00000379268 0 0 0 0 0
#>
#> $Sample2
#> -1000 -999 -998 -997 -996
#> ENST00000328596 0 0 0 0 0
#> ENST00000379265 0 0 0 0 0
#> ENST00000379268 0 0 0 0 0
#>