The qProject class is a container for the meta-data (e.g. sample
names, paths and names of sequence and alignment files) associated
with a high-throughput sequencing experiment analyzed with QuasR.
The qProject class is returned by qAlign and stores all
information on a high-throughput sequencing experiment analyzed with
QuasR. qProject objects can be conveniently passed to
‘q’-functions (function name starting with the letter
‘q’). The information is stored in the following slots:
readsa 'data.frame' with sequence read files.
reads_md5subsuma 'data.frame' with fingerprints for sequence read files.
alignmentsa 'data.frame' with alignment files.
samplesFormata 'character(1)' specifying the format of input files.
genomea 'character(1)' specifying the reference genome.
genomeFormata 'character(1)' specifying the format of the reference genome.
auxa 'data.frame' with auxiliary reference sequence files.
auxAlignmentsa 'data.frame' with alignment files for auxiliary reference sequence files.
alignera 'character(1)' specifying the aligner.
maxHitsa 'numeric(1)' specifying the maximum number of alignments per sequence.
paireda 'character(1)' specifying the paired-type; one of "no", "fr", "rf", "ff".
splicedAlignmenta 'logical(1)'; TRUE when
performing spliced-alignments.
snpFilea 'character(1)' with a file name containing SNP information.
bisulfitea 'character(1)' defining the bisulfite type; one of "no", "dir", "undir".
alignmentParametera 'character(1)' with aligner command line parameters.
projectNamea 'character(1)' with the project name.
alignmentsDira 'character(1)' with the directory to be used to store alignment files.
lib.loca 'character(1)' with the library directory to use for installing of alignment index packages.
cacheDira 'character(1)' with a directory to use for temporary files.
alnModeIDa 'character(1)' used internally to indicate the alignment mode.
In the following code snippets, x is a qProject object.
length(x): Gets the number of input files.
genome(x): Gets the reference genome as a 'character(1)'.
The type of genome is stored as an attribute in
attr(genome(x),"genomeFormat"): "BSgenome" indicates that
genome(x) refers to the name of a BSgenome package, "file"
indicates that it contains the path and filename of a genome in
FASTA format.
auxiliaries(x): Gets a data.frame with auxiliary
target sequences, with one row per auxiliary target, and columns
"FileName" and "AuxName".
alignments(x): Gets a list with two elements "genome" and
"aux". alignments(x)$genome contains a data.frame
with length(x) rows and the columns "FileName" (containing
the path to bam files with genomic alignments) and
"SampleName". alignments(x)$aux contains a
data.frame with one row per auxiliary target sequence (with
auxiliary names as row names), and length(x) columns.
In the following code snippets, x is a qProject object.
x[i]: Get qProject object instance with i
input files, where i can be an NA-free logical, numeric, or
character vector.
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
#> [1] TRUE
# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
auxFile <- "extdata/auxiliaries.txt"
proj <- qAlign(sampleFile, genomeFile, auxiliaryFile=auxFile)
#> alignment files missing - need to:
#> create 2 auxiliary alignment(s)
#> Creating an Rbowtie index for /private/var/folders/xc/cl1fyykn2pj4ryhdw6r1mqtc0000gn/T/RtmppUFZk5/file6775775f0b30/reference/extdata/NC_001422.1.fa
#> Finished creating index
#> Testing the compute nodes...
#> OK
#> Loading QuasR on the compute nodes...
#> preparing to run on 1 nodes...
#> done
#> Available cores:
#> nodeNames
#> Mac-1761831615261.local
#> 1
#> Performing auxiliary alignments for 2 samples. See progress in the log file:
#> /private/var/folders/xc/cl1fyykn2pj4ryhdw6r1mqtc0000gn/T/RtmppUFZk5/file6775775f0b30/reference/QuasR_log_677575c8dc1c.txt
#> Auxiliary alignments have been created successfully
#>
proj
#> Project: qProject
#> Options : maxHits : 1
#> paired : no
#> splicedAlignment: FALSE
#> bisulfite : no
#> snpFile : none
#> geneAnnotation : none
#> Aligner : Rbowtie v1.49.0 (parameters: -m 1 --best --strata)
#> Genome : /private/var/folders/xc/cl1fyykn2pj4ryhdw6r1mq.../hg19sub.fa (file)
#>
#> Reads : 2 files, 2 samples (fastq format):
#> 1. chip_1_1.fq.bz2 Sample1 (phred33)
#> 2. chip_2_1.fq.bz2 Sample2 (phred33)
#>
#> Genome alignments: directory: same as reads
#> 1. chip_1_1_6775490635c1.bam
#> 2. chip_2_1_677536b3375b.bam
#>
#> Aux. alignments: 1 file, directory: same as reads
#> a. /private/var/folders/xc/cl1fyykn2pj4ryhdw6r1mqt.../NC_001422.1.fa phiX174
#> 1. chip_1_1_677555eaeead.bam
#> 2. chip_2_1_677532dbcbec.bam
#>
# alignment statistics using a qProject
alignmentStats(proj)
#> seqlength mapped unmapped
#> Sample1:genome 95000 2339 258
#> Sample2:genome 95000 3609 505
#> Sample1:phiX174 5386 251 7
#> Sample2:phiX174 5386 493 12
# alignment statistics using bam files
alignmentStats(alignments(proj)$genome$FileName)
#> seqlength mapped unmapped
#> chip_1_1_6775490635c1.bam 95000 2339 258
#> chip_2_1_677536b3375b.bam 95000 3609 505
alignmentStats(unlist(alignments(proj)$aux))
#> seqlength mapped unmapped
#> chip_1_1_677555eaeead.bam 5386 251 0
#> chip_2_1_677532dbcbec.bam 5386 493 0