You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+79-74
Original file line number
Diff line number
Diff line change
@@ -10,11 +10,15 @@ DigestR is an open-source software developed for the R statistical language envi
10
10
Users can interact with DigestR in two major ways: via point and click graphical user interfaces (GUIs) or by entering command directly in the R console.
11
11
This guide is intended to give an overview of DigestR's functions.
12
12
13
+
To generate coincidence maps, DigestR requires:
14
+
- A reference proteome (see function gp())
15
+
- A .csv file exported from Mascot (see the converting file section)
16
+
13
17
## How to install digestR package from GitHub
14
18
DigestR was tested with `R v4.3.1`, and `R v4.4.1`.
15
19
16
-
We recommend using R '4.3' or later versions. In case of dependencies issues, use the renv package to reproduce the exact environment used.
17
-
See section 'Reproducing R environment'
20
+
In case of dependencies issues, use the renv package to reproduce the exact environment used.
21
+
See section 'Reproducing R environment'below.
18
22
19
23
### Prerequisites
20
24
@@ -43,55 +47,13 @@ Install the the digestR package directly from GitHub
43
47
44
48
library(digestR)
45
49
46
-
## Reproducing the R Environment
47
-
48
-
This project uses `renv` to manage package dependencies. To reproduce the exact environment used:
49
-
### Step 1: Install and Load renv
50
-
```sh
51
-
install.packages("renv")
52
-
library(renv)
53
-
```
54
-
### Step 2: Clone the Repository
55
-
You can clone the repository using your system's terminal. Run the following command:
Change the working directory to where you cloned the repository:
66
-
```sh
67
-
setwd("path/to/DigestR")
68
-
```
69
-
Replace "path/to/DigestR" with the actual path where the repository was cloned.
70
-
71
-
### Setp 4: Initialize renv
72
-
```sh
73
-
renv::init()
74
-
```
75
-
### Step 5: Restore the Project Environment
76
-
```sh
77
-
renv::restore()
78
-
```
79
-
### Step 6: Install the Package
80
-
```sh
81
-
devtools::install()
82
-
```
83
-
### Step 7: load the Package
84
-
```sh
85
-
library(DigestR)
86
-
```
87
-
88
50
## GUI Functions Documentation
89
51
90
52
This document provides details of several Graphical User Interface (GUI) functions implemented in the R programming language using the Tcl/Tk toolkit.
91
53
92
54
### Supported file formats
93
55
94
-
DigestR supports Mascot (.csv) generated files. These files may be converted to .dcf files using DigestR file conversion functions pm().
56
+
DigestR supports Mascot (.csv) generated files. These files must be converted to .dcf files using DigestR file conversion functions pm().
95
57
96
58
### Converting files.
97
59
@@ -107,7 +69,7 @@ By default, Mascot creates a header, this 3 line header is required for the .csv
107
69
108
70
An example data file can be found here: https://github.com/LewisResearchGroup/digestR/blob/main/Example%20Files/Data_Example.csv
109
71
110
-
### Generating Proteome: gp()
72
+
### Generating a reference proteome: gp()
111
73
The generate_proteome function streamlines the process of accessing and downloading protein data from Ensembl BioMart, facilitating the creation of proteomes for comparison against experimental peptides. To generate a new proteome, users begin by selecting their desired Biomart library, using the dropdown menu – options include "genes" or "ensembl," with "genes" being the default value.
112
74
113
75
Following this, users input a search pattern to explore datasets within the BiomaRt database (e.g., "sapiens" or "taurus"). Upon clicking the "Search Datasets" button, the function connects to the BiomaRt servers and retrieves datasets matching the provided pattern. The outcomes are displayed in the "Dataset Results" listbox, showing the dataset names, descriptions, and versions. Double-clicking on a result selects the dataset for further processing.
@@ -120,42 +82,26 @@ Warning: Generating proteomes can take several minutes to several hours dependin
120
82
For convernience, some proteomes have already been generated and can be found here:
To create "digestion" maps, peptides identified by Mascot or MaxQuant need to be mapped to their proteomic location. First, the user needs to select a proteome to align peptides against (see Generate Proteome). DigestR will automatically detect and utilize all proteomes located within the "data/proteomes" subfolder. Users can also import their own proteomes into this subfolder. After proteome selection, users can align Mascot identified peptides along the selected proteome from a single or multiple files. These alignments generate "coincidence" or "digestion" maps that users can interact with.
The csd() function allows users to plot amino acid distributions at C-terminus or N-terminus to track changes in cut site representation/specificity between groups. This function allows users to select a file to generate either a logo plots of the P4-P4' positions or bar plots at the P1 (Nterminus) or P1' (Cterminus) position. To identify cleavage sites of biological significance, it is possible to normalize the distribution with a specific amino acid sequence. Users can directly import a protein sequence in the appropriate box. The function then calculates the representation frequency for amino acid within the protein sequence to normalize the experimental amino acid cut-site distributions.
Defect in proteolytic activity might have an impact on digested peptide length. Therefore, DigestR was developed to calculate and plot peptide length distributions in amino acids using the pd() command. Users can select a folder or subfolder and process all CSV files in that directory. Files can be selected directly in the loaded files box. If no files are selected, all files will be used to generate the density plots. At least two files need to be imported in order to generate Venn diagrams Files will be grouped depending on the second string of the filename. Three types of density plots from grouped CSV files can be chosen by the user: Overlay, Ridges, and Colored Ridges.
DigetR also allows for the creation of Venn diagrams in order to analyze peptide overlaps between groups. The vd function allows for users to import files contained in a specific folder and generate Venn diagram. Files can be selected directly in the loaded files box. If no files are selected, all files will be used to generate the Venn diagram . At least two files need to be imported in order to generate Venn diagrams
To open a "digestion" map in DigestR, either select "Open/Close Files" from the File menu or use the commands fo() or fs() in the R console. If multiple files have been opened, only the most recently opened spectrum will appear in the main plot window. To switch to another spectrum, double-click on a file name within the GUI. To close one or more files, select the desired files from the table and then press the "Close file" button.
152
98
153
-
#### 2. Manipulate dcf files: mf()
99
+
#### 3. Manipulate dcf files: mf()
154
100
The mf() function in DigestR allow users to perform various mathematical operations with dcf files, facilitating comprehensive data manipulation and analysis. With mf(), users can add, substract, multiply, merge and divide, the data contained in multiple dcf files. This functionality allows users to perform mathematical operations tailored to their specific research needs, streamlining data processing and enhancing the overall analytical capabilities of DigestR.
The ct() function allows users to interact with the "digestion" map directly through the graphical interface. Users can display the "digestion" map either at a proteome or protein level. By default, the full proteome view is displayed.
The plot color function allows users to easily manipulate the plot colors. To open the plot color GUI, enter the command co(). Color preferences can be applied to multiple spectra simultaneously by selecting names from the files list. Plot color options for the selected files may be configured individually using the buttons provided on the right side of the GUI. The "Axes" button changes the color of the x and y axes, "BG" changes the background color, and "Peak labels" changes the label color of identified peaks.
DigestR allows multiple "digestion" maps to be displayed concurrently on a single plot through the command ol(). To add or remove loaded files, select the digestion maps to overlay and click the "add" or "remove" buttons. The order of overlaid maps in the main plot window is taken directly from the order of digestion maps appearing in the overlays list box. Individual files can be assigned their own colors. The plot legend will be automatically generated, but it can be suppressed by unchecking the "Display names of the overlay spectrum on the plot" option. Similarly, the path of "digestion" maps can be suppressed by checking the corresponding checkbox.
Users can overlay known protease cut sites onto the "digestion" map(s) using the cs() command. It is important to note that this function requires a CSV file containing the names of the proteases and their respective cleavage sites. An example CSV file can be found here: https://github.com/LewisResearchGroup/digestR/blob/main/tests/Proteasecutsiteslist.csv
DigestR includes various zooming and scrolling commands, accessible through the zoom GUI by selecting "Zoom" from the View menu or using the command zm(). Digestion maps can be navigated using the arrow pad provided in the zoom GUI or by using the five distinct zoom functions called by the buttons provided on the right side of the zoom GUI. Many of these functions are iterative and must be exited by right-clicking in the main plot window.
The gl() function allows users to override the threshold at which proteins are labeled when viewing data on the proteome-wide level. By lowering the default value, more peptides will be labeled.
The csd() function allows users to plot amino acid distributions at C-terminus or N-terminus to track changes in cut site representation/specificity between groups. This function allows users to select a file to generate either a logo plots of the P4-P4' positions or bar plots at the P1 (Nterminus) or P1' (Cterminus) position. To identify cleavage sites of biological significance, it is possible to normalize the distribution with a specific amino acid sequence. Users can directly import a protein sequence in the appropriate box. The function then calculates the representation frequency for amino acid within the protein sequence to normalize the experimental amino acid cut-site distributions.
Defect in proteolytic activity might have an impact on digested peptide length. Therefore, DigestR was developed to calculate and plot peptide length distributions in amino acids using the pd() command. Users can select a folder or subfolder and process all CSV files in that directory. Files can be selected directly in the loaded files box. If no files are selected, all files will be used to generate the density plots. At least two files need to be imported in order to generate Venn diagrams Files will be grouped depending on the second string of the filename. Three types of density plots from grouped CSV files can be chosen by the user: Overlay, Ridges, and Colored Ridges.
DigetR also allows for the creation of Venn diagrams in order to analyze peptide overlaps between groups. The vd function allows for users to import files contained in a specific folder and generate Venn diagram. Files can be selected directly in the loaded files box. If no files are selected, all files will be used to generate the Venn diagram . At least two files need to be imported in order to generate Venn diagrams
0 commit comments