makeComplexDB.Rd
This function generates a comprehensive cross-species database of
protein complexes. It downloads the complex definitions from
http://wodaklab.org/cyc2008/resources/CYC2008_complex.tab (S. cerevisiae),
https://mips.helmholtz-muenchen.de/corum/download/releases/current/allComplexes.txt.zip
(mammals),
https://www.pombase.org/data/annotations/Gene_ontology/GO_complexes/Complex_annotation.tsv
(S. pombe),
http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_complexes_20200809.txt
(human) and
https://ftp.ebi.ac.uk/pub/databases/intact/complex/current/complextab/
(human, mouse, C. elegans and S. pombe),
and next uses the babelgene
package to map the
complexes to orthologs in the other species. A pre-generated version of the
database is provided with einprot
(see listComplexDBs()
).
makeComplexDB(
dbDir,
customComplexTxt = NULL,
Cyc2008Db = NULL,
CorumDb = NULL,
PombaseDb = NULL,
HuMAP2Db = NULL,
CPortal9606Db = NULL,
CPortal10090Db = NULL,
CPortal6239Db = NULL,
CPortal284812Db = NULL
)
Path to database directory, where all raw files will be downloaded and the output will be saved. Will be created if it doesn't exist.
File path to text file with custom complexes
(if any). Should be a tab-delimited text file with five columns:
"Complex.name"
, "Gene.names"
(semi-colon separated),
"Organism"
, "Source"
, "PMID"
.
data.frames providing
annotations from CYC2008 (S cerevisiae), Corum (mammals),
Pombase (S pombe), HuMAP2 (human) and the Complex Portal (human, mouse,
C. elegans, S. pombe),
respectively. These arguments are provided mainly to allow testing, and
typically are not specified by the end user, except in cases where the
files have already been downloaded and stored locally. If provided,
it is important that the data frames are obtained by simply reading the
downloaded text files - the function assumes a certain set of columns.
If NULL
(the default), the files will be downloaded from the
paths indicated in the Details.
Invisibly, the path to the generated complex database.
## Read small subsets of the raw files provided with einprot to make the
## processing faster. Typically, the files would be downloaded as part of
## the process of generating the complex DB.
cyc2008db <- read.delim(system.file("extdata", "complexes",
"cyc2008_complex_extract.tab",
package = "einprot"))
corumdb <- read.delim(system.file("extdata", "complexes",
"corum_complex_extract.txt",
package = "einprot"))
pombasedb <- read.delim(system.file("extdata", "complexes",
"pombase_complex_extract.tsv",
package = "einprot"))
humap2db <- read.delim(system.file("extdata", "complexes",
"humap2_complex_extract.txt",
package = "einprot"), sep = ",")
cportal9606db <- read.delim(system.file("extdata", "complexes",
"complexportal9606_complex_extract.tsv",
package = "einprot"))
cportal10090db <- read.delim(system.file("extdata", "complexes",
"complexportal10090_complex_extract.tsv",
package = "einprot"))
cportal6239db <- read.delim(system.file("extdata", "complexes",
"complexportal6239_complex_extract.tsv",
package = "einprot"))
cportal284812db <- read.delim(system.file("extdata", "complexes",
"complexportal284812_complex_extract.tsv",
package = "einprot"))
dbdir <- tempdir()
dbs <- makeComplexDB(dbDir = dbdir, Cyc2008Db = cyc2008db,
CorumDb = corumdb, PombaseDb = pombasedb,
HuMAP2Db = humap2db, CPortal9606Db = cportal9606db,
CPortal10090Db = cportal10090db,
CPortal6239Db = cportal6239db,
CPortal284812Db = cportal284812db)
#> mouse
#> human
#> baker's yeast
#> Caenorhabditis elegans
#> Schizosaccharomyces pombe 972h-
## List of complexes
compl <- readRDS(dbs$complPath)
compl
#> CharacterList of length 25
#> [["S.cer: Ada2p/Gcn5p/Ada3 transcription activator complex"]] ADA2 ... NGG1
#> [["S.cer: TRAPP complex"]] BET3 BET5 GSG1 KRE11 ... TRS20 TRS23 TRS31 TRS33
#> [["S.cer: histone deacetylase complex"]] HDA1 HDA2 HDA3
#> [["human: BCL6-HDAC4 complex"]] BCL6 HDAC4
#> [["human: BCL6-HDAC5 complex"]] BCL6 HDAC5
#> [["mouse: BLOC-2 (biogenesis of lysosome-related organelles complex 2)"]] Hps...
#> [["rat: Bcl2l1-Dnm1l-Mff-Clta complex"]] Dnm1l Clta Bcl2l1 Mff
#> [["S.pombe: nucleotide-excision repair factor 1 complex"]] swi10 rhp14 rad16
#> [["S.pombe: nucleotide-excision repair factor 2 complex"]] rhp41 rhp42
#> [["S.pombe: nucleotide-excision repair factor 3 complex"]] tfb1 ptr8 ... pmh1
#> ...
#> <15 more elements>
## Complexes mapped to all species via orthologs
orth <- readRDS(dbs$orthPath)
orth
#> $mouse
#> CharacterList of length 22
#> [["mouse: BLOC-2 (biogenesis of lysosome-related organelles complex 2)"]] Hps...
#> [["mouse: CPX-10_SMAD2-SMAD3-SMAD4 complex (+1 alt. ID)"]] Smad4 Smad2 Smad3
#> [["mouse: CPX-6_bZIP transcription factor complex, Atf4-Creb1 (+1 alt. ID)"]] ...
#> [["mouse: CPX-7_bZIP transcription factor complex, Atf1-Atf4 (+1 alt. ID)"]] ...
#> [["rat: Bcl2l1-Dnm1l-Mff-Clta complex"]] Bcl2l1 Clta Dnm1l Mff
#> [["human: BCL6-HDAC4 complex"]] Bcl6 Hdac4
#> [["human: BCL6-HDAC5 complex"]] Bcl6 Hdac5
#> [["human: HuMAP2_00000_conf3"]] Fyco1 Trub2
#> [["human: HuMAP2_00001_conf4"]] Anxa6 Ehd1 Pafah1b2 Pafah1b3 Smad1 Tbcb
#> [["human: HuMAP2_00002_conf5"]] Clcc1 Dazap2 Fam168b ... Rbpms Rbpms2 Slc4a7
#> ...
#> <12 more elements>
#>
#> $human
#> CharacterList of length 22
#> [["human: BCL6-HDAC4 complex"]] BCL6 HDAC4
#> [["human: BCL6-HDAC5 complex"]] BCL6 HDAC5
#> [["human: CPX-1_SMAD2-SMAD3-SMAD4 complex (+1 alt. ID)"]] SMAD3 SMAD4 SMAD2
#> [["human: CPX-8_bZIP transcription factor complex, ATF4-CREB1 (+1 alt. ID)"]] ...
#> [["human: CPX-9_bZIP transcription factor complex, ATF1-ATF4 (+1 alt. ID)"]] ...
#> [["human: HuMAP2_00000_conf3"]] FYCO1 TRUB2
#> [["human: HuMAP2_00001_conf4"]] ANXA6 SMAD1 TBCB EHD1 PAFAH1B2 PAFAH1B3
#> [["human: HuMAP2_00002_conf5"]] RBPMS GDE1 PLEKHB1 ... CLCC1 RBMS2 SLC4A7
#> [["mouse: BLOC-2 (biogenesis of lysosome-related organelles complex 2)"]] HPS...
#> [["rat: Bcl2l1-Dnm1l-Mff-Clta complex"]] BCL2L1 CLTA DNM1L MFF
#> ...
#> <12 more elements>
#>
#> $`baker's yeast`
#> CharacterList of length 13
#> [["S.cer: Ada2p/Gcn5p/Ada3 transcription activator complex"]] ADA2 ... NGG1
#> [["S.cer: TRAPP complex"]] BET3 BET5 GSG1 KRE11 ... TRS20 TRS23 TRS31 TRS33
#> [["S.cer: histone deacetylase complex"]] HDA1 HDA2 HDA3
#> [["S.pombe: nucleotide-excision repair factor 1 complex"]] RAD10 RAD1 RAD14
#> [["S.pombe: nucleotide-excision repair factor 2 complex"]] RAD4
#> [["S.pombe: nucleotide-excision repair factor 3 complex"]] RAD3 SSL2 ... TFB3
#> [["S.pombe: CPX-540_Mitochondrial inner membrane pre-sequence translocase complex"]]
#> [["S.pombe: CPX-546_DNA replication factor C complex"]] RFC1 RFC4 RFC5 RFC2 RFC3
#> [["S.pombe: CPX-547_PCNA homotrimer"]] POL30
#> [["human: HuMAP2_00001_conf4"]] ALF1
#> ...
#> <3 more elements>
#>
#> $`Caenorhabditis elegans`
#> CharacterList of length 18
#> [["C.elegans: CPX-365_GARP tethering complex"]] vps-52 vps-51 vps-53 vps-54
#> [["C.elegans: CPX-368_Polycomb Repressive Complex 2"]] mes-2 mes-3 mes-6
#> [["C.elegans: CPX-372_Zyg-9/Tac-1 complex"]] tac-1 zyg-9
#> [["human: CPX-1_SMAD2-SMAD3-SMAD4 complex (+1 alt. ID)"]] sma-2 sma-4
#> [["human: CPX-8_bZIP transcription factor complex, ATF4-CREB1 (+3 alt. IDs)"]] ...
#> [["human: HuMAP2_00000_conf3"]] Y43B11AR.3
#> [["human: HuMAP2_00001_conf4"]] rme-1 sma-2 tbcb-1
#> [["human: HuMAP2_00002_conf5"]] ZC155.4 fox-1 mec-8 abts-1
#> [["rat: Bcl2l1-Dnm1l-Mff-Clta complex"]] clic-1 drp-1 mff-2
#> [["S.cer: Ada2p/Gcn5p/Ada3 transcription activator complex"]] pcaf-1
#> ...
#> <8 more elements>
#>
#> $`Schizosaccharomyces pombe 972h-`
#> CharacterList of length 14
#> [["S.pombe: nucleotide-excision repair factor 1 complex"]] swi10 rhp14 rad16
#> [["S.pombe: nucleotide-excision repair factor 2 complex"]] rhp41 rhp42
#> [["S.pombe: nucleotide-excision repair factor 3 complex"]] tfb1 ptr8 ... pmh1
#> [["S.pombe: CPX-540_Mitochondrial inner membrane pre-sequence translocase complex"]]
#> [["S.pombe: CPX-546_DNA replication factor C complex"]] rfc3 rfc1 rfc4 rfc5 rfc2
#> [["S.pombe: CPX-547_PCNA homotrimer"]] pcn1
#> [["S.cer: Ada2p/Gcn5p/Ada3 transcription activator complex"]] gcn5 ada2
#> [["S.cer: TRAPP complex"]] bet5 trs20 bet3 trs23 trs31 trs33
#> [["S.cer: histone deacetylase complex"]] clr3
#> [["human: HuMAP2_00001_conf4"]] alp11
#> ...
#> <4 more elements>
#>