Launch an analysis workflow on quantifications obtained with Spectronaut. Note that Spectronaut support in einprot is currently experimental - please be aware that the interface may change, and interpret results with caution.

runSpectronautAnalysis(
  templateRmd = system.file("extdata/process_basic_template.Rmd", package = "einprot"),
  outputDir = ".",
  outputBaseName = "SpectronautAnalysis",
  reportTitle = "Spectronaut LFQ data processing",
  reportAuthor = "",
  forceOverwrite = FALSE,
  experimentInfo = list(),
  species,
  spectronautFile,
  spectronautFileType,
  outLevel,
  spectronautLogFile,
  aName,
  idCol = function(df) combineIds(df, combineCols = c("PG.Genes", "PG.ProteinGroups")),
  labelCol = function(df) combineIds(df, combineCols = c("PG.Genes", "PG.ProteinGroups")),
  geneIdCol = function(df) getFirstId(df, colName = "PG.Genes"),
  proteinIdCol = "PG.ProteinGroups",
  stringIdCol = function(df) combineIds(df, combineCols = c("PG.Genes",
    "PG.ProteinGroups"), combineWhen = "missing", makeUnique = FALSE),
  extraFeatureCols = NULL,
  iColPattern = ".PG.Quantity$",
  sampleAnnot,
  includeOnlySamples = "",
  excludeSamples = "",
  minScore = 10,
  minPeptides = 2,
  imputeMethod = "MinProb",
  assaysForExport = NULL,
  addHeatmaps = TRUE,
  mergeGroups = list(),
  comparisons = list(),
  ctrlGroup = "",
  allPairwiseComparisons = TRUE,
  singleFit = TRUE,
  subtractBaseline = FALSE,
  baselineGroup = "",
  normMethod = "none",
  spikeFeatures = NULL,
  stattest = "limma",
  minNbrValidValues = 2,
  minlFC = 0,
  samSignificance = TRUE,
  nperm = 250,
  volcanoAdjPvalThr = 0.05,
  volcanoLog2FCThr = 1,
  volcanoMaxFeatures = 25,
  volcanoLabelSign = "both",
  volcanoS0 = 0.1,
  volcanoFeaturesToLabel = "",
  addInteractiveVolcanos = FALSE,
  interactiveDisplayColumns = NULL,
  interactiveGroupColumn = NULL,
  complexFDRThr = 0.1,
  maxNbrComplexesToPlot = Inf,
  seed = 42,
  includeFeatureCollections = c(),
  minSizeToKeepSet = 2,
  customComplexes = list(),
  complexSpecies = "all",
  complexDbPath = NULL,
  stringVersion = "11.5",
  stringDir = NULL,
  linkTableColumns = c(),
  customYml = NULL,
  doRender = TRUE
)

Arguments

templateRmd

Path to the template R Markdown file. Typically does not need to be modified.

outputDir

Path to a directory where all output files will be written. Will be created if it doesn't exist.

outputBaseName

Character string providing the 'base name' of the output files. All output files will start with this prefix.

reportTitle, reportAuthor

Character scalars, giving the title and author for the result report.

forceOverwrite

Logical, whether to force overwrite an existing Rmd file with the same outputBaseName in the outputDir.

experimentInfo

Named list with information about the experiment. Each entry of the list must be a scalar value.

species

Character scalar providing the species. Must be one of the supported species (see getSupportedSpecies()). Either the common or the scientific name can be used.

spectronautFile

Character string pointing to the Spectronaut Report.tsv file. File paths will be expressed in canonical form (using normalizePath()) before they are processed.

spectronautFileType

Character string indicating what type of input file spectronautFile represents. Either "pg_pivot" or "long_format".

outLevel

Character string indicating the desired output level. Currently only "pg" is supported.

spectronautLogFile

Character string pointing to the Spectronaut setup.txt log file. File paths will be expressed in canonical form (using normalizePath()) before they are processed.

aName

Character scalar indicating the column to use for the main assay.

idCol, labelCol, geneIdCol, proteinIdCol, stringIdCol

Arguments defining the feature identifiers (row names, should be unique), feature labels (for plots, can be anything), gene IDs (single gene symbols, will be matched against complexes and GO terms, can be NULL), protein IDs (UniProt IDs, will be used to create automatic URLs and match to species-specific identifiers, each entry can consist of multiple UniProt IDs separated by semicolons), and stringIdCol (single gene or protein ID, will be used to retrieve STRING networks, can be NULL). Each of these arguments can be either a character vector of column names in the input file (after application of make.names), in which case the corresponding feature ID is generated by simply concatenating the values in these columns, or a function with one input argument (a data.frame, corresponding to the annotation columns of the input file), returning a character vector corresponding to the desired feature IDs.

extraFeatureCols

Named list (or NULL) defining additional, user-specified feature annotation columns to add to the object (in addition to the ones defined by idCol, labelCol, geneIdCol, proteinIdCol and stringIdCol). Similar to these column definitions, each entry of the list must be either a character vector of column names or a function taking a data.frame as input and returning a single character column. These columns will be created after the standard columns (einprotId, einprotGene, einprotProtein, einprotLabel, IDsForSTRING), and thus these columns can be used as well to create the user-specified ones.

iColPattern

Character scalar defining a regular expression to identify sample columns (only used if spectronautFileType is "pg_pivot". Typically one of ".PG.Quantity$" or ".PG.IBAQ$".

sampleAnnot

A data.frame with at least columns named sample and group, used to explicitly specify the group assignment for each sample. It can also contain a column named batch, in which case this will be used as a covariate in the limma or proDA tests. The values in the sample column should correspond to the names of the columns of interest in the input file, after removing the iColPattern.

includeOnlySamples, excludeSamples

Character vectors defining specific samples to include or exclude from all analyses.

minScore

Numeric, minimum score for a protein to be retained in the analysis. Set to NULL if no score filtering is desired.

minPeptides

Numeric, minimum number of peptides for a protein to be retained in the analysis. Set to NULL if no filtering on the number of peptides is desired.

imputeMethod

Character string defining the imputation method to use. Currently, "impSeqRob", "MinProb", and "MinProbGlobal" are supported. See doImputation for more details about the methods.

assaysForExport

Character vector defining the name(s) of the assays to use for exported abundances and barplots. This could, for example, be set to an assay containing 'absolute' abundances, if available, even if another assay is used for the actual analysis and comparison of groups. If set to NULL or an assay name that does not exist in the SingleCellExperiment object, the 'main' assay will be used.

addHeatmaps

Logical scalar indicating whether to include heatmaps or not. This controls both the heatmap showing the missing value pattern in the data, as well as the summary heatmaps of the quantitative information in the data. For large data sets, excluding the heatmaps can significantly speed up the processing time.

mergeGroups

Named list of character vectors defining sample groups to merge to create new groups, that will be used for comparisons. Any specification of comparisons or ctrlGroup should be done in terms of the new (merged) group names.

comparisons

List of character vectors defining comparisons to perform. The first element of each vector represents the denominator of the comparison. If not empty, ctrlGroup and allPairwiseComparisons are ignored.

ctrlGroup

Character vector defining the sample group(s) to use as control group in comparisons.

allPairwiseComparisons

Logical, should all pairwise comparisons be performed?

singleFit

Logical scalar indicating whether a single model fit should be used (and results for pairwise comparisons extracted via contrasts). If FALSE, the data set will be subset to the relevant samples for each comparison. Only applicable if stattest is "limma" or "proDA".

subtractBaseline

Logical scalar, whether to subtract the background/ reference value for each feature in each batch before fitting the model. If TRUE, requires that a 'batch' column is available.

baselineGroup

Character scalar representing the reference group. Only used if subtractBaseline is TRUE, in which case the abundance values for a given sample will be adjusted by subtracting the average value across all samples in the baselineGroup from the same batch as the original sample.

normMethod

Character scalar indicating the normalization method to use. Currently, any method from MsCoreUtils::normalizeMethods() or "none" are valid values.

spikeFeatures

Character vector indicating the 'spike-in' features to use for estimation of normalization factors. If NULL (default), all features are used.

stattest

Either "ttest", "limma" or "proDA", the testing framework to use. Could also be "none" if no test should be performed.

minNbrValidValues

Numeric, the minimum number of valid values for a protein to be used for statistical testing.

minlFC

Numeric, minimum log fold change to test against (only used if stattest = "limma").

samSignificance

Logical scalar, indicating whether the SAM statistic should be used to determine significance (similar to the approach used by Perseus). Only used if stattest = "ttest". If FALSE, the p-values are adjusted using the Benjamini-Hochberg approach and used to determine significance.

nperm

Numeric, number of permutations to use in the statistical testing (only used if stattest = "ttest").

volcanoAdjPvalThr

Numeric, adjusted p-value threshold to determine which proteins to highlight in the volcano plots.

volcanoLog2FCThr

Numeric, log-fold change threshold to determine which proteins to highlight in the volcano plots.

volcanoMaxFeatures

Numeric, maximum number of significant features to label in the volcano plots.

volcanoLabelSign

Character scalar, either 'both', 'pos', or 'neg', indicating whether to label the most significant features regardless of sign, or only those with positive/negative log-fold changes.

volcanoS0

Numeric, S0 value to use to generate the significance curve in the volcano plots (only used if stattest = "ttest").

volcanoFeaturesToLabel

Character vector with features to always label in the volcano plots (regardless of significance).

addInteractiveVolcanos

Logical scalar indicating whether to add interactive volcano plots to the html report. For experiments with many quantified features or many comparisons, setting this to TRUE can make the html report very large and difficult to interact with.

interactiveDisplayColumns

Character vector (or NULL) indicating which columns to include in the tooltip for the interactive volcano plots. The default shows the feature ID.

interactiveGroupColumn

Character scalar (or NULL, default) indicating the column to group points by in the interactive volcano plot. Hovering over a point will highlight all other points with the same value of this column.

complexFDRThr

Numeric, FDR threshold for significance in testing of complexes.

maxNbrComplexesToPlot

Numeric, the maximum number of significant complexes for which to make separate volcano plots. Defaults to Inf, i.e., no limit.

seed

Numeric, random seed to use for any non-deterministic calculations.

includeFeatureCollections

Character vector, a subset of c("complexes", "GO", "pathways").

minSizeToKeepSet

Numeric scalar indicating the smallest number of features that have to overlap with the current data set in order to retain a feature set for testing.

customComplexes

List of character vectors providing custom complexes to test for significant differences between groups.

complexSpecies

Either "all" or "current", depending on whether complexes defined for all species, or only those defined for the current species, should be tested for significance.

complexDbPath

Character string providing path to the complex DB file (generated with makeComplexDB()).

stringVersion

Character scalar giving the version of the STRING database to query.

stringDir

Character scalar (or NULL) providing the path to a folder where the STRING files will be downloaded (or loaded from, if they already exist). If NULL (default), they will be downloaded to a temporary directory.

linkTableColumns

Character vector with regular expressions that will be matched against the column names of the rowData of the generated SingleCellExperiment object and included in the link table in the end of the report.

customYml

Character string providing the path to a custom YAML file that can be used to overwrite default settings in the report. If set to NULL (default), no alterations are made.

doRender

Logical scalar. If FALSE, the Rmd file will be generated (and any parameters injected), but not rendered.

Value

Invisibly, the path to the compiled html report.

Author

Charlotte Soneson