Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job stuck Sirius 6.05 #200

Open
Do-rossi opened this issue Sep 11, 2024 · 20 comments
Open

Job stuck Sirius 6.05 #200

Do-rossi opened this issue Sep 11, 2024 · 20 comments
Assignees
Labels
probable bug Marks bug reports where the reported issue has not yet been confirmed/reproduced

Comments

@Do-rossi
Copy link

Hi,

I have an issue using Sirius 6.05 on MacOS with negative ms/ms data. The main job stopped around 70% when i run my data using Sirius, Canopus and Database search. Some of the jobs are stuck at 70% or 20%. I tried several times to cancel all and then compute using only Sirius but it doesn’t look better. I tried to include/exclude High mass compound without any change and with or without Zodiac but it's the same results. My log also says “Invalidate existing Results and Recompute” and then start a computation that never ends.

Capture d’écran 2024-09-11 à 16 25 41

"sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <740>[BackgroundRunJob-740] Invalidate existing Results and Recompute!
sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <740>[BackgroundRunJob-740] Start computation..."

I also figured out that even if I choose adducts like [M+HCOOH−H]−, only [M−H]− are found in my data.
(I know that there is [M+HCOOH−H]− adducts)

I'm maybe doing something wrong, If you have any advice let me know.

@Do-rossi Do-rossi added the probable bug Marks bug reports where the reported issue has not yet been confirmed/reproduced label Sep 11, 2024
@MartinHoffmannJena
Copy link
Collaborator

Hi,

Are you able to share the data that produces this state (together with the set of parameters used to start the computation)?

@MartinHoffmannJena MartinHoffmannJena self-assigned this Sep 11, 2024
@Do-rossi
Copy link
Author

Hi,

This is the mgf file obtained after Mzmine.
Sirius.mgf.zip
I used the GUI and didn't change the default parameters, except that I also choose HCOOH adducts in fallbacks adducts and a custom database search.

config spectra-search --FormulaSearchSettings.applyFormulaConstraintsToBottomUp=false --IsotopeSettings.filter=true --UseHeuristic.useOnlyHeuristicAboveMz=650 --FormulaSearchDB=, --Timeout.secondsPerTree=0 --FormulaSettings.enforced=H,C,N,O,P --Timeout.secondsPerInstance=0 --AlgorithmProfile=qtof --SpectralMatchingMassDeviation.allowedPeakDeviation=10.0ppm --AdductSettings.enforced=, --AdductSettings.prioritizeInputFileAdducts=true --UseHeuristic.useHeuristicAboveMz=300 --IsotopeMs2Settings=IGNORE --MS2MassDeviation.allowedMassDeviation=10.0ppm --SpectralMatchingMassDeviation.allowedPrecursorDeviation=10.0ppm --FormulaSearchSettings.performDeNovoBelowMz=400.0 --FormulaSearchSettings.applyFormulaConstraintsToDatabaseCandidates=false --EnforceElGordoFormula=true --NumberOfCandidatesPerIonization=1 --FormulaSettings.detectable=B,S,Cl,Se,Br --NumberOfCandidates=10 --AdductSettings.fallback=[[M-H]-,[M+CH2O2-H]-] --FormulaSearchSettings.performBottomUpAboveMz=0 --FormulaResultThreshold=true --ExpansiveSearchConfidenceMode.confidenceScoreSimilarityMode=APPROXIMATE --StructureSearchDB=ester_lib,METACYC,BloodExposome,CHEBI,COCONUT,FooDB,GNPS,HMDB,HSDB,KEGG,KNAPSACK,LOTUS,LIPIDMAPS,MACONDA,MESH,MiMeDB,NORMAN,PLANTCYC,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONFOOD,PUBCHEMANNOTATIONSAFETYANDTOXIC,SUPERNATURAL,TeroMol,YMDB --RecomputeResults=false formulas fingerprints classes structures

Thank you for your help

@MartinHoffmannJena
Copy link
Collaborator

Hi,

I'm not able to reproduce this, I don't get any stuck jobs and I also do see [M+HCOOH−H] results:

image

Can you run it again and check the CPU/Memory load on your computer while you run it?

@Do-rossi
Copy link
Author

Hi,

I recomputed all tasks and now it's stuck at 73%. My CPU and memory was like this when I started the computation:
Capture d’écran 2024-09-18 à 09 43 24
Capture d’écran 2024-09-18 à 09 43 51

And now that it's stuck, it's like that:
Capture d’écran 2024-09-18 à 12 29 21
Capture d’écran 2024-09-18 à 12 29 37

Also a part of my log :

INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] DONE!
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381946>[CanopusSubToolJob-381946 | 3964 (618855364793789458)] DONE!
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Start computation...
sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Start computation...
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] DONE!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381970>[CanopusSubToolJob-381970 | 3954 (618855364516965359)] DONE!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] DONE!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <382190>[CanopusSubToolJob-382190 | 3929 (618855364076563369)] DONE!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Start computation...
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Start computation...
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Invalidate existing Results and Recompute!
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <97>[BackgroundRunJob-97] Start computation...
sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9
INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Start computation...

Sorry for bothering you...

Have a great day !

@zglong1
Copy link

zglong1 commented Sep 18, 2024

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine.
SIRIUS_MGF.zip

@MartinHoffmannJena
Copy link
Collaborator

MartinHoffmannJena commented Sep 18, 2024

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine. SIRIUS_MGF.zip

Could you please double click on a running "stuck" structure job and copy the log content here? It should look like this

image

@zglong1
Copy link

zglong1 commented Sep 18, 2024

Aye, I will; it's currently running from start to finish, so if/when it stalls, I'll post it. The whole job's only at 11% right now though, and I'm running it locally vs our computing cluster (which had updates a few days ago and SIRIUS seems to stall indefinitely once the GUI splash window opens, we'll see if it's a SIRIUS issue or cluster issue; already reached out to cluster tech people about it), so it may be a bit.

@zglong1
Copy link

zglong1 commented Sep 19, 2024

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier.
Stuck

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log:
Full log.txt

@zglong1
Copy link

zglong1 commented Sep 19, 2024

For reference, I'm sitting at 5-10% CPU useage, 53% memory useage, and 3-6% GPU useage as it's sitting there. This is with a Windows 10 PC with an Intel Core I7 4GHz 4-core processor, and 64GB 3.2GHz RAM, >100GB of space left on my SSD.

@MartinHoffmannJena
Copy link
Collaborator

MartinHoffmannJena commented Sep 19, 2024

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier. Stuck

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log: Full log.txt

Thank you, this is kinda tricky since I cannot reproduce it at all (I ran the same file multiple time and with different settings and it never stalls). From your screenshot it seems to be that one of the stalling feature has the ID "772", could you please reload the data and only compute that feature?

It'd be interesting to see if it stalls on that feature specifically or only if multiple features are computed at the same time.

Additionally, do you have that issue with every dataset, or just specific ones?

@Do-rossi
Copy link
Author

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

Capture d’écran 2024-09-19 à 10 06 23

@zglong1
Copy link

zglong1 commented Sep 19, 2024

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

Settings

@MartinHoffmannJena
Copy link
Collaborator

MartinHoffmannJena commented Sep 19, 2024

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

Capture d’écran 2024-09-19 à 10 06 23

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

Settings

What happens if you input the mgf into a new SIRIUS project and then

a) only compute feature 772 with default parameters (formula, predict, structure)?
b) only compute feature 772 with the parameters you showed above (formula, predict, structure)?

(Please create a new project for a and b respectively)

@zglong1
Copy link

zglong1 commented Sep 19, 2024

I restarted SIRIUS before doing A and B, just to keep things consistent.

A) With completely default settings, it finished in about 10 seconds.

B) With the settings I posted originally, it also finished; took a bit longer, maybe 15 seconds?

But, in both cases, it worked fine. So strange.

@MartinHoffmannJena
Copy link
Collaborator

MartinHoffmannJena commented Sep 19, 2024

Okay, so it is probably only happening under load, now I'd like to understand if this is a GUI issue or a workflow issue. Could you run the .mgf in it's entirety again, but this time use the CLI with:

a) default GUI parameters
b) your parameters

(restart SIRIUS and new project inbetween)

You can use the "show command" button to get a CLI command that corresponds to your GUI parameters

@zglong1
Copy link

zglong1 commented Sep 19, 2024

Scary, I have a deep-seated fear of command line, but I guess now's the time to get over it and learn how to use it. Should make my using SIRIUS on our computing cluster more productive without having to run it with the GUI.

I'll try to get to doing this sometime today and get back to you!

@MartinHoffmannJena
Copy link
Collaborator

MartinHoffmannJena commented Sep 19, 2024

EDIT: Please hold off on doing this until 6.0.6 is released (today or tomorrow)

Let me know if you need help, what you need to do is the following:

  1. Open the SIRIUS GUI, load in your .mgf and set your parameters like you usually would

  2. Instead of clicking "compute", click "show command" instead, copy the contents of the clipboard and paste it into some text editor

  3. Open a command prompt then type:

sirius --input C:\Users\Username\Documents\mgfName.mgf --project C:\Users\Username\Documents\siriusProjectName.sirius

and then paste the command after that. The whole thing should look like this (a bit different if you choose different parameters obviously):

image

@zglong1
Copy link

zglong1 commented Sep 19, 2024

Ah, saw your edit. I'll go again once 6.0.6 is released, but right now I'm on my laptop and was curious if it will work on this machine vs my home PC. This one's running Windows 11, Intel i9 11900H at 2.5 GHz processor (8 cores), and 32 GB of RAM.

I did a fresh install of SIRIUS 6.0.5 and used the same MGF file and my normal settings (as above, with a 60 second timeout per compound). It finished completely in 3hr 17 minutes. It didn't get stuck at any step.

@zglong1
Copy link

zglong1 commented Oct 7, 2024

So it's been a bit, and I got a new computer, but here's an update using 6.0.6:

With my old computer, despite my initial success, I ended running into the same stalling issues when using the GUI.

With the new computer (Windows 11 Pro, AMD Ryzen 9 9950X 16-core 4.3 GHz processor, 192 GB DDR5 RAM), running in the GUI, I still run into stalling issues.

HOWEVER, with an n of 1, using the command line version, the job completed without any issues. I'm currently running a second job to check consistency.

@Do-rossi
Copy link
Author

EDIT: Please hold off on doing this until 6.0.6 is released (today or tomorrow)

Let me know if you need help, what you need to do is the following:

  1. Open the SIRIUS GUI, load in your .mgf and set your parameters like you usually would
  2. Instead of clicking "compute", click "show command" instead, copy the contents of the clipboard and paste it into some text editor
  3. Open a command prompt then type:

sirius --input C:\Users\Username\Documents\mgfName.mgf --project C:\Users\Username\Documents\siriusProjectName.sirius

and then paste the command after that. The whole thing should look like this (a bit different if you choose different parameters obviously):

image

Hi, I tried to use sirius using the command line tool but I was not able to get something. It says "zsh: no matches found: --AdductSettings.fallback=[[M-H]-,[M+CHO2+H]-,[M+CH2O2-H]-]", even if I added sirius to the PATH and sirius --help worked correctly.

When I try to paste the command like you said, it doesn't work:
Capture d’écran 2024-10-21 à 13 04 25

Maybe I'm doing something wrong...
Thank you for your help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
probable bug Marks bug reports where the reported issue has not yet been confirmed/reproduced
Projects
None yet
Development

No branches or pull requests

3 participants