Skip to content

YaoYinYing/PPRCODE_Guideline

Repository files navigation

PPRCODE Cover Image is presented with MolecularNodes Project


Paper-PPRCODE Available at Colab Available at BioLib Platform: docker Build with Ubuntu x86_64

PPRCODE Workflow Docker Image Pulls

PPRCODE

Original Project site: PPR Code Prediction Server - From PPR to RNA

NOTE

This original website is Down.

Please switch to:

  1. Colab release
  2. Docker release
  3. Biolib release

Three ways to run PPRCODE

  1. WebServer from BioLib; the original webserver provided by Yin Lab is down and will be no longer maintained.
  2. Colab Reimplementation
  3. Local run: Docker image or BioLib cloud scripts

Run PPRCODE locally via APIs provided by BioLib

  1. install required BioLib package
    pip3 install -U pybiolib
  2. run PPRCODE via Shell commands
    wget -qnc https://raw.githubusercontent.com/YaoYinYing/PPRCODE_Guideline/main/ppr_example.fasta 
    biolib run YaoYinYing/pprcode --fasta ppr_example.fasta

the run results will be located at $PWD/biolib_results

PS: Due to the I/O issue of Biolib as docker container wrapper, the customized --save_dir option will produce no results.

Run PPRCODE locally in docker

  1. Install docker daemon by following the official getting-started page instruction.

  2. Clone this repo

    git clone https://github.com/YaoYinYing/PPRCODE_Guideline
  3. PPRCODE docker image.

    fetch the latest image

    docker pull yaoyinying/pprcode:latest

    You may also build it from scratch:

    cd PPRCODE_Guideline
    docker build -f docker/Dockerfile -t pprcode . 

    Alternatively, if you wish to run PPRCODE with your own version of docker/run_pprcode.py, you may build a patched image for local usage by the following:

    docker build -f docker/Dockerfile_patch -t pprcode .
  4. Create Conda environment for run this docker image in an instance container

    conda create -y -n pprcode python pip
    conda activate pprcode
    cd <repo>/PPRCODE_Guideline
    pip install -r docker/requirements.txt
  5. Run run_docker.py to an example data

    conda activate pprcode
    mkdir test
    
    # fetch an example dataset 
    wget -qnc https://raw.githubusercontent.com/YaoYinYing/PPRCODE_Guideline/main/ppr_example.fasta -P test
    
    # use PS_Scan as default program
    python /repo/PPRCODE_Guideline/docker/run_docker.py --fasta test/ppr_example.fasta --save_dir ./save-ps_scan  --plot_item=bar,score,edge,ppr,rna
    
    # or use pprfinder provided by Small's Lab
    python  /repo/PPRCODE_Guideline/docker/run_docker.py --fasta test/ppr_example.fasta --save_dir ./save-pprfinder --plot_item=bar,score,edge,ppr,rna --program=pprfinder
  6. Advance options

    python  /repo/PPRCODE_Guideline/docker/run_docker.py --help

FAQs

Q: What is PPR and PPR code?

Pentatricopeptide repeat (PPR) proteins constitute a large family whose members serve as single-stranded RNA (ssRNA)-binding proteins; these proteins are particularly abundant in terrestrial plants, as more than 400 members have been identified in Arabidopsis and rice.

PPR proteins are typically characterized by tandem degenerate repeats of a 35-amino acid motif. Within a given repeat, the combinatorial di-residues at the 5th and 35th positions are responsible for specific RNA base recognition. These di-residues are referred to as the PPR code.

Q: What is PPRCODE prediction server?

PPRCODE prediction server is aimed to provide services to the PPR community to facilitate PPR code and target RNA prediction. Once a PPR protein sequence is submitted, the server firstly identifies the PPR motifs using the PScan algorithm provided by Prosite, and then outputs the individual PPR motifs that is demarcated based on the PPR structure. PPR code is generally extracted from the 5th and 35th amino acids of each PPR motif, and the best matched RNA base for the PPR code is provided. As a result, the potential RNA target for the PPR sequence is available.

PPRCODE Prediction Server: Interface

Q: How do I submit a sequence to the PPRCODE prediction server?

Go to the PPRCODE prediction server in BioLib submission form directly and do the following:

  1. Paste your FASTA sequence in the upper text area.
  2. Modify the options if needed.
  3. Click the Run button. After the submission, the webpage will be automatically run for several second until the job is finished.

Q: How long does it take to finish a task?

Less than three second for each sequence.

Q: How many sequences can I submit in one submission?

As many as you want.

Q: What does the prediction result mean?

The result page contains a table like the following:

This is a demo sequence of PPR10 from Zea mays.

Motif Start Motif End Motif Sequence Fifth amino acid Last amino acid PPR Code RNA base Motif Length ProSite Score
138 172 ASALEMVVRALGREGQHDAVCALLDETPLPPGSRL E L EL ? 35 5.031
174 208 VRAYTTVLHALSRAGRYERALELFAELRRQGVAPT T T TT A>G 35 12.989
209 244 LVTYNVVLDVYGRMGRSWPRIVALLDEMRAAGVEPD N D ND U>C>G 36 11.093
245 279 GFTASTVIAACCRDGLVDEAVAFFEDLKARGHAPC S C SC ? 35 11.411
280 314 VVTYNALLQVFGKAGNYTEALRVLGEMEQNGCQPD N D ND U>C>G 35 12.737
315 349 AVTYNELAGTYARAGFFEEAARCLDTMASKGLLPN N N NN C>U 35 11.477
350 384 AFTYNTVMTAYGNVGKVDEALALFDQMKKTGFVPN N N NN C>U 35 14.096
385 419 VNTYNLVLGMLGKKSRFTVMLEMLGEMSRSGCTPN N N NN C>U 35 10.358
420 454 RVTWNTMLAVCGKRGMEDYVTRVLEGMRSCGVELS N S NS C>U>A 35 9.887
455 489 RDTYNTLIAAYGRCGSRTNAFKMYNEMTSAGFTPC N C NC U>C>>A 35 11.674
490 524 ITTYNALLNVLSRQGDWSTAQSIVSKMRTKGFKPN N N NN C>U 35 11.542
525 560 EQSYSLLLQCYAKGGNVAGIAAIENEVYGSGAVFPS S S SS A 36 6.467
561 595 WVILRTLVIANFKCRRLDGMETAFQEVKARGYNPD R D RD - 35 6.445
596 630 LVIFNSMLSIYAKNGMYSKATEVFDSIKRSGLSPD N D ND U>C>G 35 12.419
631 666 LITYNSLMDMYAKCSESWEAEKILNQLKCSQTMKPD N D ND U>C>G 36 8.67
667 701 VVSYNTVINGFCKQGLVKEAQRVLSEMVADGMAPC N C NC U>C>>A 35 13.778
702 736 AVTYHTLVGGYSSLEMFSEAREVIGYMVQHGLKPM H M HM ? 35 10.348
737 771 ELTYRRVVESYCRAKRFEEARGFLSEVSETDLDFD R D RD - 35 8.089

and finally you will also get a predicted sequence like this:

(?) (A>G) (U>C>G) (?) (U>C>G) (C>U) (C>U) (C>U) (C>U>A) (U>C>>A) (C>U) (A) (-) (U>C>G) (U>C>G) (U>C>>A) (?) (-)

Q: Why does the prediction result of my sequence look like a mess?

PS_Scan/PPRfinder identifies the sequence and motifs of a PPR protein by its similarity to the general P-type PPR. Sequences with low identity will hardly be predicted. In this circumstance, manual correction is strongly recommended.

Troubleshoot

If there is any problem and advice with the website, you are welcome to contact us via email.

Contributers:

  • Yinying Yao: Main program development and further maintainance.
  • Zeyuan Guan: Basic Framework of the original webserver.
  • Junjie Yan: Writing and data collecting.
  • Xiang Wang: Providing useful advices to the original webserver design.

Cite information

Yan Junjie#, Yao Yinying#, Hong Sixing, Yang Yan, Shen Cuicui, Zhang Qunxia, Zhang Delin, Zou Tingting, Yin Ping*. Delineation of pentatricopeptide repeat codes for target RNA prediction, Nucleic Acids Research. 2019 February 11. doi: doi.org/10.1093/nar/gkz075