Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running segway train : "can't tie one track in multiple groups" and "Set of training windows is empty" #148

Open
Cheryn-A opened this issue Oct 29, 2020 · 3 comments

Comments

@Cheryn-A
Copy link

Hello, I am having troubles running segway, here is my situation.
I wish to use segway with 6 ChIP-seq tracks.
I started from raw data in fastq format of each signal and its associated input (s1.fastq , s1_input.fastq, ... , s6.fastq , s6_input.fastq)

In here https://www.biorxiv.org/content/10.1101/080382v1.full you mention at some point "The preferred input for Segway is the “fold change over control” bigWig signal file, because it is already processed and normalized." so here is the full pipeline I followed :

1-Pre-processing each fastq file:

  • Trimming with cutadapt-2.1 (Output: s1_trimmed.fastq , s1_input_trimmed.fastq, ... , s6_trimmed.fastq , s6_input_trimmed.fastq)
  • Mapping with bowtie-1.3.0 with the option of keeping only uniquely mapped read (Output: s1_trimmed.bam , s1_input_trimmed.bam, ... , s6_trimmed.bam , s6_input_trimmed.bam)
  • Sorting the bam files with samtools index (samtools-1.9) (Output: s1_trimmed_sort.bam , s1_input_trimmed_sort.bam, ... , s6_trimmed_sort.bam , s6_input_trimmed_sort.bam)
  • Removing pcr duplicates with picard-2.18.2 (Output: s1_trimmed_sort_rmDup.bam , s1_input_trimmed_sort_rmDup.bam, ... , s6_trimmed_sort_rmDup.bam , s6_input_trimmed_sort_rmDup.bam)

2-Generating bw files with deepTools-3.0.2:

bamCompare --bamfile1 s1_trimmed_sort_rmDup.bam --bamfile2 s1_input_trimmed_sort_rmDup.bam --binSize 10 --normalizeUsing RPKM --effectiveGenomeSize 3099734149 --smoothLength 1 --operation ratio --scaleFactorsMethod None --scaleFactors 1:1 -o s1_norm.bw

(Output: s1_norm.bw ... s6_norm.bw)

3-Converting bigwig into bedGraph

bigWigToBedGraph s1_norm.bw s1_norm.bedGraph

(Output: s1_norm.bedGraph ... s6_norm.bedGraph)

4-Generating genomedata files with genomedata-1.4.4:

genomedata-load-seq s1_norm.genomedata GRCh38.primary_assembly.genome.fa
genomedata-open-data s1_norm.genomedata -- tracknames s1_norm;
genomedata-load-data s1_norm.genomedata s1_norm < s1_norm.bedGraph;
genomedata-close-data s1_norm.genomedata

(Output: s1_norm.genomedata ... s6_norm.genomedata)

5-Running segway

mkdir s_traindir
segway train --resolution 10 --num-instance 10 --minibatch-fraction 0.01 --num-labels 18 s1.genomedata s2.genomedata s3.genomedata s4.genomedata s5.genomedata s6.genomedata s_traindir

This gave me the following error:

Traceback (most recent call last):
  File "/home/cheryn/anaconda3/bin/segway", line 10, in <module>
    sys.exit(main())
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 4265, in main
    return runner()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3841, in __call__
    self.run(*args, **kwargs)
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3817, in run
    self.run_train()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3267, in run_train
    self.init_train()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3172, in init_train
    self.init_shared()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3161, in init_shared
    self.save_gmtk_input()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 2247, in save_gmtk_input
    self.set_tracknames()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 1734, in set_tracknames
    self.add_track_group([trackname])  # Adds to self.tracks
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 949, in add_track_group
    raise ValueError("can't tie one track in multiple groups")
ValueError: can't tie one track in multiple groups

So I tried to follow the exemple in https://segway.readthedocs.io/en/latest/quick.html#acquiring-data and to run it with less parameters and on one genomedata file:

segway train s1.genomedata s_traindir

And I got this error:

Traceback (most recent call last):
  File "/home/cheryn/anaconda3/bin/segway", line 10, in <module>
    sys.exit(main())
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 4265, in main
    return runner()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3841, in __call__
    self.run(*args, **kwargs)
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3817, in run
    self.run_train()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3267, in run_train
    self.init_train()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3172, in init_train
    self.init_shared()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 3161, in init_shared
    self.save_gmtk_input()
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/run.py", line 2256, in save_gmtk_input
    observations.locate_windows(genomes)
  File "/home/cheryn/anaconda3/lib/python3.7/site-packages/segway/observations.py", line 949, in locate_windows
    raise ValueError("Set of training windows is empty")
ValueError: Set of training windows is empty

When I try to run it with the test.genomedata given here https://segway.readthedocs.io/en/latest/quick.html#acquiring-data it works well.

So I am a bit confused by the error messages and I do not know how to solve this.

Could you please help me?
Did I do something wrong while generating my genomedata?
Do I correctly use the command line to run segway?

I thank you in advance for your answers.

PS: I am working on Ubuntu 18.04.5

@EricR86
Copy link
Member

EricR86 commented Feb 2, 2021

@Cheryn-A hello and sorry for the very late reply. I have notifications setup for this repository and to my surprise, I did not get one for this issue.

It looks like you are having issues with Genomedata creation. Notably, at some point in your steps, there seems to be a lack of data. "Empty windows" typically means there's no data to work with. The error can't tie one track in multiple groups I believe might indicate that you're accidentally using the same trackname in each of your genomedata archives.

With genomedata-open-data, the tracknames option should be trackname with no space between it and the hyphen and the positional arguments should come after the options (though I don't really know if it makes a difference):
genomedata-open-data --trackname s1_norm s1_norm.genomedata.

The rest of your commands look correct. I would also ensure that you choose a different trackname across your archives so Segway can uniquely determine your datasets.

Most of the commands also come with a verbosity option that I would highly recommend using when debugging these issues. For example, it would help verify how much data is being loaded.

For general troubleshooting on Segway issues I would highly recommend e-mailing to the mailing list: segway-l@listserv.utoronto.ca to reach a larger audience potentially.

@varsha090597
Copy link

Hello,
I am facing the same issue "ValueError: Set of training windows is empty". I did take into account what you mentioned about the genomedata file creation in your response, but that still does not help. Any help with this would help. Thanks.

@EricR86
Copy link
Member

EricR86 commented Jun 24, 2021

@varsha090597 could please put in your segway train command?

Although it is hard to tell, the most likely situation is that there is actually no data in your genomedata archives. Perhaps try creating them again with the --verbose option to verify data is being loaded in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants