The qProject class is a container for the meta-data (e.g. sample
names, paths and names of sequence and alignment files) associated
with a high-throughput sequencing experiment analyzed with QuasR
.
The qProject class is returned by qAlign and stores all
information on a high-throughput sequencing experiment analyzed with
QuasR
. qProject objects can be conveniently passed to
‘q’-functions (function name starting with the letter
‘q’). The information is stored in the following slots:
reads
a 'data.frame' with sequence read files.
reads_md5subsum
a 'data.frame' with fingerprints for sequence read files.
alignments
a 'data.frame' with alignment files.
samplesFormat
a 'character(1)' specifying the format of input files.
genome
a 'character(1)' specifying the reference genome.
genomeFormat
a 'character(1)' specifying the format of the reference genome.
aux
a 'data.frame' with auxiliary reference sequence files.
auxAlignments
a 'data.frame' with alignment files for auxiliary reference sequence files.
aligner
a 'character(1)' specifying the aligner.
maxHits
a 'numeric(1)' specifying the maximum number of alignments per sequence.
paired
a 'character(1)' specifying the paired-type; one of "no", "fr", "rf", "ff".
splicedAlignment
a 'logical(1)'; TRUE
when
performing spliced-alignments.
snpFile
a 'character(1)' with a file name containing SNP information.
bisulfite
a 'character(1)' defining the bisulfite type; one of "no", "dir", "undir".
alignmentParameter
a 'character(1)' with aligner command line parameters.
projectName
a 'character(1)' with the project name.
alignmentsDir
a 'character(1)' with the directory to be used to store alignment files.
lib.loc
a 'character(1)' with the library directory to use for installing of alignment index packages.
cacheDir
a 'character(1)' with a directory to use for temporary files.
alnModeID
a 'character(1)' used internally to indicate the alignment mode.
In the following code snippets, x
is a qProject object.
length(x)
: Gets the number of input files.
genome(x)
: Gets the reference genome as a 'character(1)'.
The type of genome is stored as an attribute in
attr(genome(x),"genomeFormat")
: "BSgenome" indicates that
genome(x)
refers to the name of a BSgenome package, "file"
indicates that it contains the path and filename of a genome in
FASTA format.
auxiliaries(x)
: Gets a data.frame
with auxiliary
target sequences, with one row per auxiliary target, and columns
"FileName" and "AuxName".
alignments(x)
: Gets a list with two elements "genome" and
"aux". alignments(x)$genome
contains a data.frame
with length(x)
rows and the columns "FileName" (containing
the path to bam files with genomic alignments) and
"SampleName". alignments(x)$aux
contains a
data.frame
with one row per auxiliary target sequence (with
auxiliary names as row names), and length(x)
columns.
In the following code snippets, x
is a qProject object.
x[i]
: Get qProject
object instance with i
input files, where i
can be an NA-free logical, numeric, or
character vector.
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
#> [1] TRUE
# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
auxFile <- "extdata/auxiliaries.txt"
proj <- qAlign(sampleFile, genomeFile, auxiliaryFile=auxFile)
#> alignment files missing - need to:
#> create 2 auxiliary alignment(s)
#> Creating an Rbowtie index for /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T/RtmpHtA1jk/file5ddc16c1b3df/reference/extdata/NC_001422.1.fa
#> Finished creating index
#> Testing the compute nodes...
#> OK
#> Loading QuasR on the compute nodes...
#> preparing to run on 1 nodes...
#> done
#> Available cores:
#> nodeNames
#> Mac-1740133007481.local
#> 1
#> Performing auxiliary alignments for 2 samples. See progress in the log file:
#> /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T/RtmpHtA1jk/file5ddc16c1b3df/reference/QuasR_log_5ddc8051f7f.txt
#> Auxiliary alignments have been created successfully
#>
proj
#> Project: qProject
#> Options : maxHits : 1
#> paired : no
#> splicedAlignment: FALSE
#> bisulfite : no
#> snpFile : none
#> geneAnnotation : none
#> Aligner : Rbowtie v1.47.0 (parameters: -m 1 --best --strata)
#> Genome : /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq.../hg19sub.fa (file)
#>
#> Reads : 2 files, 2 samples (fastq format):
#> 1. chip_1_1.fq.bz2 Sample1 (phred33)
#> 2. chip_2_1.fq.bz2 Sample2 (phred33)
#>
#> Genome alignments: directory: same as reads
#> 1. chip_1_1_5ddc61d5c573.bam
#> 2. chip_2_1_5ddc19963333.bam
#>
#> Aux. alignments: 1 file, directory: same as reads
#> a. /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_.../NC_001422.1.fa phiX174
#> 1. chip_1_1_5ddc856cef6.bam
#> 2. chip_2_1_5ddc7b2d7cc0.bam
#>
# alignment statistics using a qProject
alignmentStats(proj)
#> seqlength mapped unmapped
#> Sample1:genome 95000 2339 258
#> Sample2:genome 95000 3609 505
#> Sample1:phiX174 5386 251 7
#> Sample2:phiX174 5386 493 12
# alignment statistics using bam files
alignmentStats(alignments(proj)$genome$FileName)
#> seqlength mapped unmapped
#> chip_1_1_5ddc61d5c573.bam 95000 2339 258
#> chip_2_1_5ddc19963333.bam 95000 3609 505
alignmentStats(unlist(alignments(proj)$aux))
#> seqlength mapped unmapped
#> chip_1_1_5ddc856cef6.bam 5386 251 0
#> chip_2_1_5ddc7b2d7cc0.bam 5386 493 0