Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running seed-to-alignment for proteins without known paralogs #42

Open
lbleicher opened this issue Jun 7, 2023 · 0 comments

Comments

@lbleicher
Copy link

We are trying to run Topiary with sequences that seem to have no paralogs in most clades, but that might be named differently according to the species. So our database, which has Opisthokonts as the scope, has a sequence from yeast and one from humans. Even though they are named differently (RQC1_YEAST and TCF25_HUMAN), we believe they should be orthologs as there are virtually no species among model species with more than one sequence containing the same domain (PF04910 on Pfam).
How do we prepare the input seed in this case? We tried both using two sequences, each one with their own aliases or using all aliases from both sequences on the two entries, but after the reciprocal blast generates a 4675 sequence alignment for the 02_recip-blast-dataframe.csv , the shrunk dataframe is reduced to just one sequence, and then seed-to-alignment stops on the Aligning sequences step with the following error

muscle 5.1.linux64 [] 7.6Gb RAM, 4 cores
Built May 16 2023 07:53:40
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 1 seqs, avg length 676, max 676

double free or corruption (out)
Traceback (most recent call last):
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 478, in seed_to_alignment
df = topiary.muscle.align(df)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/muscle/muscle.py", line 96, in align
_run_muscle(input_fasta,output_fasta,super5,silent,muscle_cmd_args,muscle_binary)
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/muscle/muscle.py", line 216, in _run_muscle
raise subprocess.CalledProcessError(return_code, cmd)
subprocess.CalledProcessError: Command '['muscle', '-align', 'topiary-tmp_dULdoeuPiV_align-in.fasta', '-output', 'topiary-tmp_dULdoeuPiV_align-out.fasta']' died with <Signals.SIGABRT: 6>.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function
ret = fcn(**fcn_args.dict)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:

Caught exception in function 'seed_to_alignment'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/amandacpa/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 26, in
main()
File "/home/amandacpa/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 21, in main
wrap_function(seed_to_alignment,
File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function
raise RuntimeError(err) from e
RuntimeError:

Function seed_to_alignment raised an error.

==================

This is the latest seed file we used which caused the error above

species,name,aliases,sequence,accession
Homo sapiens,RQC1,TCF25;TCF-25;Nuclear localized protein 1;KIAA1049;NULP1;FKSG26;RQC1;YDR333C,MSRRALRRLRGEQRGQEPLGPGALHFDLRDDDDAEEEGPKRELGVRRPGGAGKEGVRVNNRFELINIDDLEDDP
VVNGERSGCALTDAVAPGNKGRGQRGNTESKTDGDDTETVPSEQSHASGKLRKKKKKQKNKKSSTGEASENGLEDIDRILERIEDSTGLNRPGPAPLSSRKHVLYVEHRHLNPDTELKRYFGARAILGEQRPRQRQRVYPKCTWLTTPKSTWPRYSKPGLSMRLLESK
KGLSFFAFEHSEEYQQAQHKFLVAVESMEPNNIVVLLQTSPYHVDSLLQLSDACRFQEDQEMARDLVERALYSMECAFHPLFSLTSGACRLDYRRPENRSFYLALYKQMSFLEKRGCPRTALEYCKLILSLEPDEDPLCMLLLIDHLALRARNYEYLIRLFQEWEAHR
NLSQLPNFAFSVPLAYFLLSQQTDLPECEQSSARQKASLLIQQALTMFPGVLLPLLESCSVRPDASVSSHRFFGPNAEISQPPALSQLVNLYLGRSHFLWKEPATMSWLEENVHEVLQAVDAGDPAVEACENRRKVLYQRAPRNIHRHVILSEIKEAVAALPPDVTTQ
SVMGFDPLPPSDTIYSYVRPERLSPISHGNTIALFFRSLLPNYTMEGERPEEGVAGGLNRNQGLNRLMLAVRDMMANFHLNDLEAPHEDDAEGEGEWD,Q9BQ70
Saccharomyces cerevisiae,RQC1,TCF25;TCF-25;Nuclear localized protein 1;KIAA1049;NULP1;FKSG26;RQC1;YDR333C,MSSRALRRLQDDNALLESLLSNSNANKMTSGKSTAGNIQKRENIFSMMNNVRDSDNSTDEGQ
MSEQDEEAAAAGERDTQSNGQPKRITLASKSSRRKKNKKAKRKQKNHTAEAAKDKGSDDDDDDEEFDKIIQQFKKTDILKYGKTKNDDTNEEGFFTASEPEEASSQPWKSFLSLESDPGFTKFPISCLRHSCKFFQNDFKKLDPHTEFKLLFDDISPESLEDIDSMTS
TPVSPQQLKQIQRLKRLIRNWGGKDHRLAPNGPGMHPQHLKFTKIRDDWIPTQRGELSMKLLSSDDLLDWQLWERPLDWKDVIQNDVSQWQKFISFYKFEPLNSDLSKKSMMDFYLSVIVHPDHEALINLISSKFPYHVPGLLQVALIFIRQGDRSNTNGLLQRALFV
FDRALKANIIFDSLNCQLPYIYFFNRQFYLAIFRYIQSLAQRGVIGTASEWTKVLWSLSPLEDPLGCRYFLDHYFLLNNDYQYIIELSNSPLMNCYKQWNTLGFSLAVVLSFLRINEMSSARNALLKAFKHHPLQLSELFKEKLLGDHALTKDLSIDGHSAENLELKA
YMARFPLLWNRNEEVTFLHDEMSSILQDYHRGNVTIDSNDGQDHNNINNLQSPFFIAGIPINLLRFAILSEESSVMAAIPSFIWSDNEVYEFDVLPPMPTSKESIEVVENIKTFINEKDLAVLQAERMQDEDLLNQIRQISLQQYIHENEESNENEG,Q05468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant