QuasR wig file export

Create a fixed-step wig file from the alignments in the genomic bam files of the ‘QuasR’ project.

qExportWig(
  proj,
  file = NULL,
  collapseBySample = TRUE,
  binsize = 100L,
  shift = 0L,
  strand = c("*", "+", "-"),
  scaling = TRUE,
  tracknames = NULL,
  log2p1 = FALSE,
  colors = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02", "#A6761D",
    "#666666"),
  includeSecondary = TRUE,
  mapqMin = 0L,
  mapqMax = 255L,
  absIsizeMin = NULL,
  absIsizeMax = NULL,
  createBigWig = FALSE,
  useRead = c("any", "first", "last"),
  pairedAsSingle = FALSE,
  clObj = NULL
)

Arguments

proj

A qProject object as returned by qAlign.

file

A character vector with the name(s) for the wig or bigWig file(s) to be generated. Either NULL or a vector of the same length as the number of bam files (for collapseBySample=FALSE) or the number of unique sample names (for collapseBySample=TRUE) in proj. If NULL, the wig or bigWig file names are generated from the names of the genomic bam files or unique sample names with an added “.wig.gz” or “.bw” extension.

collapseBySample

If TRUE, genomic bam files with identical sample name will be combined (summed) into a single track.

binsize

A numerical value defining the bin and step size for the wig or bigWig file(s). binsize will be coerced to integer().

shift

Either a vector or a scalar value defining the read shift (e.g. half of fragment length, see ‘Details’). If length(shift)>1, the length must match the number of bam files in ‘proj’, and the i-th sample will be converted to wig or bigWig using the value in shift[i]. shift will be coerced to integer(). For paired-end alignments, shift will be ignored, and a warning will be issued if it is set to a non-zero value (see ‘Details’).

strand

Only count alignments of strand. The default (“*”) will count all alignments.

scaling

If TRUE or a numerical value, the output values in the wig or bigWig file(s) will be linearly scaled by the total number of aligned reads per sample to improve comparability (see ‘Details’).

tracknames

A character vector with the names of the tracks to appear in the track header. If NULL, the sample names in proj will be used.

log2p1

If TRUE, the number of alignments x per bin will be transformed using the formula log2(x+1).

colors

A character vector with R color names to be used for the tracks.

includeSecondary

If TRUE (the default), include alignments with the secondary bit (0x0100) set in the FLAG.

mapqMin

Minimal mapping quality of alignments to be included (mapping quality must be greater than or equal to mapqMin). Valid values are between 0 and 255. The default (0) will include all alignments.

mapqMax

Maximal mapping quality of alignments to be included (mapping quality must be less than or equal to mapqMax). Valid values are between 0 and 255. The default (255) will include all alignments.

absIsizeMin

For paired-end experiments, minimal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included. Valid values are greater than 0 or NULL (default), which will not apply any minimum insert size filtering.

absIsizeMax

For paired-end experiments, maximal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included. Valid values are greater than 0 or NULL (default), which will not apply any maximum insert size filtering.

createBigWig

If TRUE, first a temporary wig file will be created and then converted to BigWig format (file extension “.bw”) using the wigToBigWig function from package rtracklayer.

useRead

For paired-end experiments, selects the read mate whose alignments should be counted, one of:

any (default): : count all alignments
first: : count only alignments from the first read
last: : count only alignments from the last read

For single-read alignments, this argument will be ignored. For paired-end alignments, setting this argument to a value different from the default (any) will cause qExportWig not to automatically use the mid of fragments, but to treat the selected read as if it would come from a single-read experiment (see ‘Details’).

pairedAsSingle

If TRUE, treat paired-end data single read data, which means that instead of calculating fragment mid-points for each read pair, the 5-prime ends of the reads is used. This is for example useful when analyzing paired-end DNAse-seq or ATAC-seq data, in which the read starts are informative for chromatin accessibility.

clObj

A cluster object to be used for parallel processing of multiple samples.

Value

(invisible) The file name of the generated wig or bigWig file(s).

Details

qExportWig() uses the genome bam files in proj as input to create wig or bigWig files with the number of alignments (pairs) per window of binsize nucleotides. By default (collapseBySample=TRUE), one file per unique sample will be created. If collapseBySample=FALSE, one file per genomic bam file will be created. See http://genome.ucsc.edu/goldenPath/help/wiggle.html for the definition of the wig format, and http://genome.ucsc.edu/goldenPath/help/bigWig.html for the definition of the bigWig format.

The genome is tiled with sequential windows of length binsize, and alignments in the bam file are assigned to these windows: Single read alignments are assigned according to their 5'-end coordinate shifted by shift towards the 3'-end (assuming that the 5'-end is the leftmost coordinate for plus-strand alignments, and the rightmost coordinate for minus-strand alignments). Paired-end alignments are assigned according to the base in the middle between the leftmost and rightmost coordinates of the aligned pair of reads, unless pairedAsSingle = TRUE is used. Each pair of reads is only counted once, and not properly paired alignments are ignored. If useRead is set to select only the first or last read in a paired-end experiment, the selected read will be treated as reads from a single read experiment. Secondary alignments can be excluded by setting includeSecondary=FALSE. In paired-end experiments, absIsizeMin and absIsizeMax can be used to select alignments based on their insert size (TLEN field in SAM Spec v1.4).

For scaling=TRUE, the number of alignments per bin $n$ for the sample $i$ are linearly scaled to the mean total number of alignments over all samples in proj according to: $n_s = n /N[i] *mean(N)$ where $n_s$ is the scaled number of alignments in the bin and $N$ is a vector with the total number of alignments for each sample. Alternatively, if scaling is set to a positive numerical value $s$ , this value is used instead of $\textnormal{mean}(N)$ , and values are scaled according to: $n_s = n /N[i] *s$ .

mapqMin and mapqMax allow to select alignments based on their mapping qualities. mapqMin and mapqMax can take integer values between 0 and 255 and equal to $-10 log_{10} Pr(\textnormal{mapping position is wrong})$ , rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

If createBigWig=FALSE and file ends with ‘.gz’, the resulting wig file will be compressed using gzip and is suitable for uploading as a custom track to your favorite genome browser (e.g. UCSC or Ensembl).

Author

Anita Lerch, Dimos Gaidatzis and Michael Stadler

Examples

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
#> [1] TRUE

# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
proj <- qAlign(sampleFile, genomeFile)
#> alignment files missing - need to:
#>     create 2 genomic alignment(s)
#> Testing the compute nodes...
#> OK
#> Loading QuasR on the compute nodes...
#> preparing to run on 1 nodes...
#> done
#> Available cores:
#> Mac-1751261719888.local: 1
#> Performing genomic alignments for 2 samples. See progress in the log file:
#> /private/var/folders/y6/nj790rtn62lfktb1sh__79hc0000gn/T/RtmpdLZwJS/file59e71a95f9b0/reference/QuasR_log_59e73a0b3688.txt
#> Genomic alignments have been created successfully
#> 

# export wiggle file
qExportWig(proj, binsize=100L, shift=0L, scaling=TRUE)
#> collecting mapping statistics for scaling...
#> done
#> start creating wig files...
#>   Sample1.wig.gz (Sample1)
#>   Sample2.wig.gz (Sample2)
#> done

Arguments

Value

Details

See also

Author

Examples