Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to mmap memory dataSize=0 File=./NS_eyo9ogxk/seq.db_h. Error 22. #21

Open
ghost opened this issue May 7, 2021 · 1 comment
Open

Comments

@ghost
Copy link

ghost commented May 7, 2021

Hello, I am new to using these tools, so please excuse me if I don't explain well. I am trying to create a pangenome of Borrelia spp to map my tick microbiome reads against to quantify Borrelia presence in my samples.

I have 11 gff files representing 11 Borrelia species, which I downloaded from NCBI. I have the fasta files too, but I believe the gff format will suffice as input from NCBI, is this correct?
My files are:
GCF_000512145.1_ASM51214v2_genomic.fna.gz GCF_002741785.1_ASM274178v1_genomic.gff.gz
GCF_000512145.1_ASM51214v2_genomic.gff.gz GCF_003606285.1_ASM360628v1_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.fna.gz GCF_003606285.1_ASM360628v1_genomic.gff.gz
GCF_000165595.2_ASM16559v2_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.gff.gz GCF_003814405.1_ASM381440v1_genomic.fna.gz
GCF_000165595.2_ASM16559v2_genomic.gff.gz GCF_001936255.1_ASM193625v1_genomic.fna.gz GCF_003814405.1_ASM381440v1_genomic.gff.gz
GCF_000181575.2_ASM18157v2_genomic.fna.gz GCF_001936255.1_ASM193625v1_genomic.gff.gz GCF_014525745.1_ASM1452574v1_genomic.fna.gz
GCF_000181575.2_ASM18157v2_genomic.gff.gz GCF_001936295.1_ASM193629v1_genomic.fna.gz GCF_014525745.1_ASM1452574v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.fna.gz GCF_001936295.1_ASM193629v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.gff.gz GCF_002741785.1_ASM274178v1_genomic.fna.gz

Currently, I get a series of errors when I input the following:

Current Behavior

2021-05-07 12:25:21.570015 COMMAND: /home/sean/.local/bin/PEPPAN -p borrelia_files/BORR -t 4 --clust_identity 0.5 --clust_match_prop 0.6 --match_identity 0.4 borrelia_files/GCF_000165595.2_ASM16559v2_genomic.gff.gz borrelia_files/GCF_000181575.2_ASM18157v2_genomic.gff.gz borrelia_files/GCF_000181895.2_ASM18189v2_genomic.gff.gz borrelia_files/GCF_000512145.1_ASM51214v2_genomic.gff.gz borrelia_files/GCF_000956315.1_ASM95631v1_genomic.gff.gz borrelia_files/GCF_001936255.1_ASM193625v1_genomic.gff.gz borrelia_files/GCF_001936295.1_ASM193629v1_genomic.gff.gz borrelia_files/GCF_002741785.1_ASM274178v1_genomic.gff.gz borrelia_files/GCF_003606285.1_ASM360628v1_genomic.gff.gz borrelia_files/GCF_003814405.1_ASM381440v1_genomic.gff.gz borrelia_files/GCF_014525745.1_ASM1452574v1_genomic.gff.gz
2021-05-07 12:25:22.032943 Run MMSeqs linclust to get exemplar sequences. Params: 0.5 identities and 0.8 align ratio
Failed to mmap memory dataSize=0 File=./NS_eyo9ogxk/seq.db_h. Error 22.
Traceback (most recent call last):
File "/home/sean/.local/bin/PEPPAN", line 8, in
sys.exit(ortho())
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1884, in ortho
params['clust'] = iterClust(params['prefix'], params['genes'], groups, dict(identity=params['clust_identity'], coverage=params['clust_match_prop'], n_thread=params['n_thread'], translate=False))
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1784, in iterClust
g, clust = getClust(prefix, g, params)
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/modules/clust.py", line 67, in getClust
with open(tabFile) as fin :
FileNotFoundError: [Errno 2] No such file or directory: './NS_eyo9ogxk/clust.tab'

Steps to Reproduce (for bugs)

PEPPAN -p borrelia_files/BORR -t 4 --clust_identity 0.5 --clust_match_prop 0.6 --match_identity 0.4 borrelia_files/*.gff.gz

This does generate some output files with my desired prefix:
BORR.encode.csv,BORR.genes and BORR.old_prediction.npz

Context

I have been searching online for clues and this was my reasoning behind changing the values for cluster identity and clust match prop and match identity. I changed -t to use fewer threads, in case it was a memory issue.

Environment

details of my environment:
To install, I did the following -
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install mmseqs2
conda install blast
conda install diamond
conda install rapidnj
conda install fasttree

command -v mmseqs blastn rapidnj diamond fasttree

/home/sean/miniconda3/envs/peppaninstall/bin/mmseqs
/usr/bin/blastn

pip3 install peppan

I ran the test data and it all worked great.
I hope this makes sense!

@Naclist
Copy link

Naclist commented Sep 4, 2021

GFFs from NCBI without preptreatment are not enough for PEPPAN to establish a pangenome for you, read the Quickstart and you will find out a fasta file should be added. Also, you can use the Prokka to deal with your fasta files and generate GFF files with the sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant