Problem with pulling proteome from NCBI--solution found! #44

ani-sch · 2024-05-17T20:18:50Z

Hello! Thank you for this great software and your time!

### EDIT/UPDATE 2--solved!
We found a solution to this problem! You can build a local BLAST database that you specifically use for the reciprocal BLAST step--there are various references/guides for doing this throughout the topiary docs and github, but you may have to dig a little.
First, find the proteome files of the species in your seed dataframe and download them. I've been able to find the protein.faa.gz files by searching the NCBI datasets site https://ncbi.nlm.nih.gov/datasets/. Then, build your local database using the makeblastdb function (run with --help argument if needed. More info online as well). I had the best luck using cat to combine files first, then using the combo file as the input. Once set up, start the pipeline: run the topiary-seed-to-alignment function and include the --local_recip_blast_db /path/to/databasename.faa argument. Running topiary-seed-to-alignment --help is helpful for setting this up.
Here are a few links that contain relevant/helpful info:

-topiary.ncbi.blast.recip API reference: https://topiary-asr.readthedocs.io/en/latest/topiary.ncbi.blast.html#module-topiary.ncbi.blast.recip
-(you may need to copy/paste this one into your browser, sorry):
https://github.com/harmslab/topiary/commit/468a6d72bbdb58a1d312f068feb8e02d9facfb34

### EDIT/UPDATE:
In the docs, it says users can specify sources of sequences (using the --blast_xml, --ncbi_blast_db, and --local_blast.db). However, I can't tell if those options only apply to building the sequence dataset (before dong reciprocal BLAST)? Or, if you can use those options to build a database for the reciprocal BLAST step specifically? If the latter is possible, we think that could solve the problem, as we could build a database with the unretrievable proteomes...but we are unsure if it'd create a problem with building/limit the sequence dataset (pre-reciprocal BLAST)? ###

### original post:
I am beginning an ASR project using this software, but am running into an issue in the seed-to-alignment phase. I have a seed-dataset and am able to run the first command in the pipeline. The BLAST query seems successful, but then after the Doing reciprocal BLAST part, I get errors (text file with error message attached). It seems like the location of the Homo sapiens proteome has changed--the error readout provides a full path link to where it thinks the proper file is, and when trying to follow it, you can't find the file.

my main question is: what is the best course of action in this situation? I really can't remove this species from my seed dataset (or set it to false) because it is a crucial species to include for my purposes. I'm assuming there's a way to get/upload to topiary the proper proteome, but I'm unsure of the best way to do that...

Thank you for your time and help!
topiary-error-may16.txt

The text was updated successfully, but these errors were encountered:

cfreye · 2024-05-23T00:21:44Z

I have been encountering the same issue. Any help would be greatly appreciated. Thank you!

ani-sch · 2024-05-27T19:19:49Z

I have been encountering the same issue. Any help would be greatly appreciated. Thank you!

I'm not sure if you are still having this problem, but we found a solution! I updated my original post outlining what we did. If any additional explanation would be helpful, let me know! :)

cfreye · 2024-05-29T20:14:34Z

@ani-sch this is very helpful thank you so much for sharing! I was able to resolve this issue on my end as well based on your suggestions.

ani-sch changed the title ~~Problem with pulling proteome from NCBI~~ Problem with pulling proteome from NCBI--solution found! May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with pulling proteome from NCBI--solution found! #44

Problem with pulling proteome from NCBI--solution found! #44

ani-sch commented May 17, 2024 •

edited

Loading

cfreye commented May 23, 2024

ani-sch commented May 27, 2024

cfreye commented May 29, 2024

Problem with pulling proteome from NCBI--solution found! #44

Problem with pulling proteome from NCBI--solution found! #44

Comments

ani-sch commented May 17, 2024 • edited Loading

cfreye commented May 23, 2024

ani-sch commented May 27, 2024

cfreye commented May 29, 2024

ani-sch commented May 17, 2024 •

edited

Loading