Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Latest commit

 

History

History
179 lines (170 loc) · 8.17 KB

README.md

File metadata and controls

179 lines (170 loc) · 8.17 KB

SWISSpalm_with_R

Overview

A simple script based on RSelenium meant to streamline SWISSpalm use. Requires Java (available from Oracle), the R package glue (available from the CRAN alone or as part of the tidyverse), and RSelenium (also on the CRAN).

Credit for the SWISSpalm database: SwissPalm: Protein Palmitoylation database. Mathieu Blanc*, Fabrice P.A. David*, Laurence Abrami, Daniel Migliozzi, Florence Armand, Jérôme Burgi and F. Gisou van der Goot. F1000Research.

Usage

Step 1:

Install prerequisite packages and Java. R packages can be installed using install.packages():

install.packages("glue")
install.packages("RSelenium")

Java must be installed from the Oracle website.

Step 2:

Run the SWISSpalm_with_R.R script in your instance of R to load its functions.

Step 3:

Now you are ready to use the getSWISSPalmData() command. If you've not run RSelenium before, this will take longer the first time you use it. This is because RSelenium downloads new versions of chromedriver and the Selenium Server when first run, and if it finds any updates to these components. Subsequent calls should take less time, but timing can be tweaked by editing the function's Sys.sleep() calls. You may get errors if you go for timings that are too small, as the webpage should finish loading before any elements are searched for. The default syntax is as follows:

getSWISSPalmData(input.path, 
                 output.directory, 
                 dataset.value = 1, 
                 species.value = 2, 
                 output.type = "download_text")

What do the parameters mean?

input.path = The file path of the text file you're submitting to SWISSpalm. This is best set in R using file.path(). The input .txt file must follow these formatting guidelines. I would advise not including a header. Each line must have one identifier of the following types given below:

List of valid identifiers

UniProt AC
UniProt secondary AC
UniProt ID
UniProt gene name
Ensembl protein
Ensembl gene
Refseq protein ID
IPI ID
UniGene ID
PomBase ID
MGI ID
RGD ID
TAIR protein ID
EuPathDb ID

output.directory = The directory for SWISSpalm to send its outputs to. This is best set in R using file.path()
dataset.value = The dataset you want to search. This can be an integer from 1 to 7.

List of values and their meanings

Dataset 1: All proteins
Dataset 2: Proteins predicted to be palmitoylated
Dataset 3: Palmitoylation validated or found in at least 1 palmitoyl-proteome (SwissPalm annotated)
Dataset 4: Palmitoylation validated proteins
Dataset 5: Palmitoylation validated proteins or found in palmitoyl-proteomes using 2 independent methods
Dataset 6: Found in palmitoyl-proteomes using 2 independent methods
Dataset 7: Dataset 6 grouped by gene

species.value = The species you want to filter your results by. This can be an integer from 1 to 87, though some values are skipped (e.g. 5 doesn't map onto a species)

List of values and their species
1 = Homo sapiens
2 = Mus musculus
3 = Rattus norvegicus
4 = Arabidopsis thaliana
6 = Saccharomyces cerevisiae
7 = Cricetulus griseus
8 = Plasmodium falciparum
9 = Chlorocebus aethiops
10 = Bos taurus
11 = Schizosaccharomyces pombe
12 = Canis familiaris
13 = Drosophila melanogaster
14 = Danio rerio
15 = HIV1 isolate HXB2
16 = Spodoptera frugiperda
17 = Gallus gallus
18 = Human herpesvirus 1
19 = Semliki forest virus
20 = Sindbis virus (the [first recorded palmitoylated biological agent](https://dx.doi.org/10.1073/pnas.76.4.1687))
21 = Oryctolagus cuniculus
22 = Sus scrofa
23 = Toxoplasma gondii
24 = Torpedo californica
25 = Nicotiana benthamiana
26 = Landoltia punctata
27 = Influenza A virus (A/udorn/1972(H3N2))
28 = Giardia intestinalis
29 = Ecc15
30 = Influenza A virus (strain A/Duck/Ukraine/1/1963 H3N8)
31 = Escherichia coli BL21-DE3
32 = Salmonella typhimurium
33 = Medicago truncatula
34 = Influenza C virus (strain C/Johannesburg/1/1966)
35 = Simian immunodeficiency virus
36 = Cryptococcus neoformans
38 = Trypanosoma brucei brucei
39 = Mesocricetus auratus
40 = HIV-1 NY5
41 = HIV-1 BH10
42 = Xenopus laevis
43 = Fr-MuLV
44 = Human adenovirus 5
45 = Leishmania major
46 = RRV (strain T48)
47 = Caenorhabditis elegans
48 = HHV-4
49 = Ki-MuSV
50 = Ha-MuSV
51 = Escherichia coli K12
53 = VACV
54 = Influenza A virus H7N1
55 = VSV
56 = Macaca mulata
57 = Solanum lycopersicum
58 = Aspergillus fumigatus
60 = Trypanosoma brucei brucei (927/4 GUTat10.1)
61 = Equus caballus
62 = MCF-MuLV
63 = MoMuLV (ts1-92b)
64 = MoMLV
65 = Neosartorya fumigata
66 = Toxoplasma gondii Me49
67 = HCMV
68 = MHV-A59
69 = Medicago falcata
70 = Oryza sativa
71 = RSV-PrC
72 = HHV-8
73 = Mallard duck
74 = AcMNPV
75 = Dictyostelium discoideum
76 = HCV
77 = SARS-CoV
78 = Lithobates catesbeiana
79 = Trichomonas vaginalis
82 = BCTV
83 = HEV-3
84 = HEV-1
85 = Mungbean yellow mosaic virus-Vigna
87 = CHIKV-S27

output.type = The type of output file you desire. Set this to one of three values: "download_text","download_xlsx" or "download_fasta"."

Step 4:

To use the files you've received, you need to know what each file contains.

  • not_in_database.txt contains each gene ID not found in the SWISSpalm database.
  • not_in_dataset.txt contains each gene ID not found in the dataset you've used.
  • query_results.txt contains palmitoylation data of your genes that SWISSpalm analysed.

Example

# Extract dataframe of gene IDs (in MGI ID format) from existing dataframe.
# Using distinct() from dplyr means I have no repeated gene IDs.
gene_list <- dplyr::distinct(my_gene_table, mgi_id) 
# Write this list of genes out to a text file with no header
write.table(gene_list, 
            file.path("data","SWISSpalm_inputs","gene_list.txt"), 
            append = FALSE, dec = ".", quote = FALSE, 
            row.names = FALSE, col.names = FALSE)
input <- file.path("data","SWISSpalm_inputs","gene_list.txt")
output <- file.path("data","SWISSpalm_outputs")
getSWISSpalmData(input, output, dataset.value = 1, species.value = 2, output.type = "download_text")
# Import query_results.txt 
results <- read.table(file.path("data","SWISSpalm_outputs","query_result.txt"),
                      quote = "", fill = TRUE,
                      sep = "\t",
                      header = TRUE) 

Issues/Bug Reports/Requests

If you have any problems, drop them on the issues page. This script was written on Windows, so it's posible that there are some incompatibilities with Linux/Mac OS.

Some Errors and their Fixes

If the function encounters an error, Selenium will not reinitialise as its default port will still be in use by the old instance of Selenium. This will trigger the error message Selenium server signals port = [yourport] is already in use. In this case, run the command:

system("taskkill /im java.exe /f", intern = FALSE, ignore.stdout = FALSE)

If getSWISSpalmData() returns an error such as Undefined error in httr call. httr output: length(url) == 1 is not TRUE, as it did on my laptop, update your Chrome installation. If this does not work, run the command binman::list_versions("chromedriver"). This will output some Chrome versions that can be entered into the chromever parameter in the rsDriver() command.