Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with pulling proteome from NCBI--solution found! #44

Open
ani-sch opened this issue May 17, 2024 · 3 comments
Open

Problem with pulling proteome from NCBI--solution found! #44

ani-sch opened this issue May 17, 2024 · 3 comments

Comments

@ani-sch
Copy link

ani-sch commented May 17, 2024

Hello! Thank you for this great software and your time!

### EDIT/UPDATE 2--solved!
We found a solution to this problem! You can build a local BLAST database that you specifically use for the reciprocal BLAST step--there are various references/guides for doing this throughout the topiary docs and github, but you may have to dig a little.
First, find the proteome files of the species in your seed dataframe and download them. I've been able to find the protein.faa.gz files by searching the NCBI datasets site https://ncbi.nlm.nih.gov/datasets/. Then, build your local database using the makeblastdb function (run with --help argument if needed. More info online as well). I had the best luck using cat to combine files first, then using the combo file as the input. Once set up, start the pipeline: run the topiary-seed-to-alignment function and include the --local_recip_blast_db /path/to/databasename.faa argument. Running topiary-seed-to-alignment --help is helpful for setting this up.
Here are a few links that contain relevant/helpful info:

-topiary.ncbi.blast.recip API reference: https://topiary-asr.readthedocs.io/en/latest/topiary.ncbi.blast.html#module-topiary.ncbi.blast.recip
-(you may need to copy/paste this one into your browser, sorry):
https://github.com/harmslab/topiary/commit/468a6d72bbdb58a1d312f068feb8e02d9facfb34

### EDIT/UPDATE:
In the docs, it says users can specify sources of sequences (using the --blast_xml, --ncbi_blast_db, and --local_blast.db). However, I can't tell if those options only apply to building the sequence dataset (before dong reciprocal BLAST)? Or, if you can use those options to build a database for the reciprocal BLAST step specifically? If the latter is possible, we think that could solve the problem, as we could build a database with the unretrievable proteomes...but we are unsure if it'd create a problem with building/limit the sequence dataset (pre-reciprocal BLAST)? ###

### original post:
I am beginning an ASR project using this software, but am running into an issue in the seed-to-alignment phase. I have a seed-dataset and am able to run the first command in the pipeline. The BLAST query seems successful, but then after the Doing reciprocal BLAST part, I get errors (text file with error message attached). It seems like the location of the Homo sapiens proteome has changed--the error readout provides a full path link to where it thinks the proper file is, and when trying to follow it, you can't find the file.

my main question is: what is the best course of action in this situation? I really can't remove this species from my seed dataset (or set it to false) because it is a crucial species to include for my purposes. I'm assuming there's a way to get/upload to topiary the proper proteome, but I'm unsure of the best way to do that...

Thank you for your time and help!
topiary-error-may16.txt

@cfreye
Copy link

cfreye commented May 23, 2024

I have been encountering the same issue. Any help would be greatly appreciated. Thank you!

@ani-sch ani-sch changed the title Problem with pulling proteome from NCBI Problem with pulling proteome from NCBI--solution found! May 27, 2024
@ani-sch
Copy link
Author

ani-sch commented May 27, 2024

I have been encountering the same issue. Any help would be greatly appreciated. Thank you!

I'm not sure if you are still having this problem, but we found a solution! I updated my original post outlining what we did. If any additional explanation would be helpful, let me know! :)

@cfreye
Copy link

cfreye commented May 29, 2024

@ani-sch this is very helpful thank you so much for sharing! I was able to resolve this issue on my end as well based on your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants