Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty files as output #11

Open
mokrobial opened this issue Sep 29, 2021 · 19 comments
Open

Empty files as output #11

mokrobial opened this issue Sep 29, 2021 · 19 comments
Assignees

Comments

@mokrobial
Copy link

Apologies if I have missed a setup step. I installed successfully with conda and was able to run the test data fq without issue. If I input a fastq and run the output is empty. I've tried several different files and it's the same output: 0 vertices. Do reads require some kind of pre-processing first?

Log:
Building deBruijnGraph...
Building deBruijnGraph took 0.485469 seconds.
deBruijnGraph has 0 vertices
Building unitig graph from deBruijn graph...
Getting connected components
Getting CCs took 1.4e-05 seconds
Calculating coverage distribution
Calculating coverage distribution took 3.4e-05 seconds
Unitig graph successfully build in 0.000138 seconds.
Unitig graph has 0 vertices
Assembling...
Cleaning graph
Assembly complete
Assembly took 0.00033 seconds
The complete assembly process took 0.485904 seconds.

@AlphaSquad
Copy link
Collaborator

The test data works fine but using your own data it does not? Odd.
Could you provide your read-file or a snippet of it? If you did, what value did you provide for k and how long are your reads?

@mokrobial
Copy link
Author

I didn't set the --k initially. I just tried with it set to 39 and still empty folders. Read length is 2x150

Github won't let me include a zip file so I've put one here:
https://drive.google.com/drive/folders/1SFkD2dDKU1GLpdtEcPoqvLcvGoEYY-fY?usp=sharing

Thanks much!

@AlphaSquad
Copy link
Collaborator

Hi sorry that it took so long, I have tested the read files you provided and found that for most of the files all the contig lengths were smaller than 500 bp. Haploflow does not report contigs shorter than 500 bp by default, so no contigs were reported.
This might happen because either there are too many strains in the sample - then Haploflow cannot distinguish them by their coverage and avoids misassemblies by breaking contigs apart - or there is no clear signal in the data, because no genome is covered more then let's say 4x or there are too many errors.
Haploflow reports (all) contigs, if the filter option is set to 0, but that probably does not make too much sense.

@Ruchank1
Copy link

Hi, I am getting an issue (empty folders, 0 vertices) with the test data file also. Can you please help me with that
Thank you.

@AlphaSquad
Copy link
Collaborator

Could you post the command you used and the output you received?

@Ruchank1
Copy link

Sure.
The command - haploflow --read-file .../forward.fastq --out test --log test/log
The output - was empty sub folders in a folder named test.
and the log file looked like this -
Building deBruijnGraph...
Building deBruijnGraph took 0.00039 seconds.
deBruijnGraph has 0 vertices
Building unitig graph from deBruijn graph...
Getting connected components
Getting CCs took 2.7e-05 seconds
Calculating coverage distribution
Calculating coverage distribution took 6.1e-05 seconds
Unitig graph successfully build in 0.000286 seconds.
Unitig graph has 0 vertices
Assembling...
Cleaning graph
Assembly complete
Assembly took 0.000669 seconds
The complete assembly process took 0.001141 seconds.

The number of vertices is 0.

@AlphaSquad
Copy link
Collaborator

Haploflow should probably use a meaningful value for k as default, but it seems like this is not working right now. Please re-try running Haploflow with setting a value for k, e.g. --k 41

@Ruchank1
Copy link

I tried running the command with setting the k value, but it still shows 0 vertices.

@AlphaSquad
Copy link
Collaborator

Could you post your forward.fastq? The toy data set is named HIV_3_toy.fq that's why I am asking.

@Ruchank1
Copy link

Hi, I actually tried with the HIV_3_toy.fq dataset also, I got the same output. So, I can't really figure out what is happening.

@AlphaSquad
Copy link
Collaborator

It is odd. The only explanation I have is that Haploflow tries to read a non-existing file. Could you maybe try absolute paths for all files?

@Ruchank1
Copy link

Yes, I tried giving absolute paths as well. I am still getting empty files as output. I installed Haploflow using conda, is there a possibility that I missed out on some step?

@Ruchank1
Copy link

Hi, I tried it on a linux machine as well but it still gives 0 vertices as output. I cannot really locate the problem.

@AlphaSquad AlphaSquad self-assigned this Dec 1, 2021
@AlphaSquad
Copy link
Collaborator

Hm okay, Haploflow was only tested on UNIX systems, but it is strange that it is not working on a linux machine either. Unfortunately I am not really sure what to do here, since I cannot reproduce this problem.
I will however add a check for missing files, but it may take a moment until this change is done and available on conda (and if no file is missing this does not solve your problem either).

@reesea22
Copy link

I have been getting empty files as output for my data as well. When I attempt to run the toy dataset through haploflow I get the following error:
$ haploflow --read-file Haploflow/HIV_3_toy.fq.gz --out test --log test/log
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted (core dumped)

@AlphaSquad
Copy link
Collaborator

Are you also using the conda version/install? If yes, can you try to unzip the read file first?

@adelizamae
Copy link

Hi, I also don't have output files. I'm not sure what I'm doing wrong. :(

I ran:
haploflow --read-file sample.fastq --k 41 --out test/ --log test/log/

But there is no output file except the Cov.tsv
haploflow-no-output

@AlphaSquad
Copy link
Collaborator

Hi, I am sorry that Haploflow is not working out of the box for you. Unfortunately I will need a little bit more information to give you any feedback (since the command looks ok): Are you using the conda version or did you build Haploflow yourself? What do the log/Cov.tsv files say? How big is your sample.fastq and how long are the reads?

@adelizamae
Copy link

Hi, I used both the conda version and the build.
Turns out, there are no contigs greater than 500 in length that's why there is no output in mine.

I have SARS-CoV-2 long read sequences (produced by using ONT) and I would like to know what parameters I can use to do de novo assembly.

I uploaded my sample fastq in this gdrive.
https://drive.google.com/drive/folders/1__4TscNV_LJyRbgzjGN-s3ehcB52S5zI?usp=sharing
I'm just starting to learn bioinfo, your help is greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants