You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Thank you for this great software and your time!
### EDIT/UPDATE 2--solved!
We found a solution to this problem! You can build a local BLAST database that you specifically use for the reciprocal BLAST step--there are various references/guides for doing this throughout the topiary docs and github, but you may have to dig a little.
First, find the proteome files of the species in your seed dataframe and download them. I've been able to find the protein.faa.gz files by searching the NCBI datasets site https://ncbi.nlm.nih.gov/datasets/. Then, build your local database using the makeblastdb function (run with --help argument if needed. More info online as well). I had the best luck using cat to combine files first, then using the combo file as the input. Once set up, start the pipeline: run the topiary-seed-to-alignment function and include the --local_recip_blast_db /path/to/databasename.faa argument. Running topiary-seed-to-alignment --help is helpful for setting this up.
Here are a few links that contain relevant/helpful info:
### EDIT/UPDATE:
In the docs, it says users can specify sources of sequences (using the --blast_xml, --ncbi_blast_db, and --local_blast.db). However, I can't tell if those options only apply to building the sequence dataset (before dong reciprocal BLAST)? Or, if you can use those options to build a database for the reciprocal BLAST step specifically? If the latter is possible, we think that could solve the problem, as we could build a database with the unretrievable proteomes...but we are unsure if it'd create a problem with building/limit the sequence dataset (pre-reciprocal BLAST)? ###
### original post:
I am beginning an ASR project using this software, but am running into an issue in the seed-to-alignment phase. I have a seed-dataset and am able to run the first command in the pipeline. The BLAST query seems successful, but then after the Doing reciprocal BLAST part, I get errors (text file with error message attached). It seems like the location of the Homo sapiens proteome has changed--the error readout provides a full path link to where it thinks the proper file is, and when trying to follow it, you can't find the file.
my main question is: what is the best course of action in this situation? I really can't remove this species from my seed dataset (or set it to false) because it is a crucial species to include for my purposes. I'm assuming there's a way to get/upload to topiary the proper proteome, but I'm unsure of the best way to do that...
I have been encountering the same issue. Any help would be greatly appreciated. Thank you!
I'm not sure if you are still having this problem, but we found a solution! I updated my original post outlining what we did. If any additional explanation would be helpful, let me know! :)
Hello! Thank you for this great software and your time!
### EDIT/UPDATE 2--solved!
We found a solution to this problem! You can build a local BLAST database that you specifically use for the reciprocal BLAST step--there are various references/guides for doing this throughout the topiary docs and github, but you may have to dig a little.
First, find the proteome files of the species in your seed dataframe and download them. I've been able to find the protein.faa.gz files by searching the NCBI datasets site https://ncbi.nlm.nih.gov/datasets/. Then, build your local database using the
makeblastdb
function (run with--help
argument if needed. More info online as well). I had the best luck usingcat
to combine files first, then using the combo file as the input. Once set up, start the pipeline: run thetopiary-seed-to-alignment
function and include the--local_recip_blast_db /path/to/databasename.faa
argument. Runningtopiary-seed-to-alignment --help
is helpful for setting this up.Here are a few links that contain relevant/helpful info:
### EDIT/UPDATE:
In the docs, it says users can specify sources of sequences (using the --blast_xml, --ncbi_blast_db, and --local_blast.db). However, I can't tell if those options only apply to building the sequence dataset (before dong reciprocal BLAST)? Or, if you can use those options to build a database for the reciprocal BLAST step specifically? If the latter is possible, we think that could solve the problem, as we could build a database with the unretrievable proteomes...but we are unsure if it'd create a problem with building/limit the sequence dataset (pre-reciprocal BLAST)? ###
### original post:
I am beginning an ASR project using this software, but am running into an issue in the seed-to-alignment phase. I have a seed-dataset and am able to run the first command in the pipeline. The BLAST query seems successful, but then after the Doing reciprocal BLAST part, I get errors (text file with error message attached). It seems like the location of the Homo sapiens proteome has changed--the error readout provides a full path link to where it thinks the proper file is, and when trying to follow it, you can't find the file.
my main question is: what is the best course of action in this situation? I really can't remove this species from my seed dataset (or set it to false) because it is a crucial species to include for my purposes. I'm assuming there's a way to get/upload to topiary the proper proteome, but I'm unsure of the best way to do that...
Thank you for your time and help!
topiary-error-may16.txt
The text was updated successfully, but these errors were encountered: