You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am new to using these tools, so please excuse me if I don't explain well. I am trying to create a pangenome of Borrelia spp to map my tick microbiome reads against to quantify Borrelia presence in my samples.
I have 11 gff files representing 11 Borrelia species, which I downloaded from NCBI. I have the fasta files too, but I believe the gff format will suffice as input from NCBI, is this correct?
My files are:
GCF_000512145.1_ASM51214v2_genomic.fna.gz GCF_002741785.1_ASM274178v1_genomic.gff.gz
GCF_000512145.1_ASM51214v2_genomic.gff.gz GCF_003606285.1_ASM360628v1_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.fna.gz GCF_003606285.1_ASM360628v1_genomic.gff.gz
GCF_000165595.2_ASM16559v2_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.gff.gz GCF_003814405.1_ASM381440v1_genomic.fna.gz
GCF_000165595.2_ASM16559v2_genomic.gff.gz GCF_001936255.1_ASM193625v1_genomic.fna.gz GCF_003814405.1_ASM381440v1_genomic.gff.gz
GCF_000181575.2_ASM18157v2_genomic.fna.gz GCF_001936255.1_ASM193625v1_genomic.gff.gz GCF_014525745.1_ASM1452574v1_genomic.fna.gz
GCF_000181575.2_ASM18157v2_genomic.gff.gz GCF_001936295.1_ASM193629v1_genomic.fna.gz GCF_014525745.1_ASM1452574v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.fna.gz GCF_001936295.1_ASM193629v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.gff.gz GCF_002741785.1_ASM274178v1_genomic.fna.gz
Currently, I get a series of errors when I input the following:
Current Behavior
2021-05-07 12:25:21.570015 COMMAND: /home/sean/.local/bin/PEPPAN -p borrelia_files/BORR -t 4 --clust_identity 0.5 --clust_match_prop 0.6 --match_identity 0.4 borrelia_files/GCF_000165595.2_ASM16559v2_genomic.gff.gz borrelia_files/GCF_000181575.2_ASM18157v2_genomic.gff.gz borrelia_files/GCF_000181895.2_ASM18189v2_genomic.gff.gz borrelia_files/GCF_000512145.1_ASM51214v2_genomic.gff.gz borrelia_files/GCF_000956315.1_ASM95631v1_genomic.gff.gz borrelia_files/GCF_001936255.1_ASM193625v1_genomic.gff.gz borrelia_files/GCF_001936295.1_ASM193629v1_genomic.gff.gz borrelia_files/GCF_002741785.1_ASM274178v1_genomic.gff.gz borrelia_files/GCF_003606285.1_ASM360628v1_genomic.gff.gz borrelia_files/GCF_003814405.1_ASM381440v1_genomic.gff.gz borrelia_files/GCF_014525745.1_ASM1452574v1_genomic.gff.gz
2021-05-07 12:25:22.032943 Run MMSeqs linclust to get exemplar sequences. Params: 0.5 identities and 0.8 align ratio
Failed to mmap memory dataSize=0 File=./NS_eyo9ogxk/seq.db_h. Error 22.
Traceback (most recent call last):
File "/home/sean/.local/bin/PEPPAN", line 8, in
sys.exit(ortho())
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1884, in ortho
params['clust'] = iterClust(params['prefix'], params['genes'], groups, dict(identity=params['clust_identity'], coverage=params['clust_match_prop'], n_thread=params['n_thread'], translate=False))
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1784, in iterClust
g, clust = getClust(prefix, g, params)
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/modules/clust.py", line 67, in getClust
with open(tabFile) as fin :
FileNotFoundError: [Errno 2] No such file or directory: './NS_eyo9ogxk/clust.tab'
This does generate some output files with my desired prefix:
BORR.encode.csv,BORR.genes and BORR.old_prediction.npz
Context
I have been searching online for clues and this was my reasoning behind changing the values for cluster identity and clust match prop and match identity. I changed -t to use fewer threads, in case it was a memory issue.
Environment
details of my environment:
To install, I did the following -
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install mmseqs2
conda install blast
conda install diamond
conda install rapidnj
conda install fasttree
GFFs from NCBI without preptreatment are not enough for PEPPAN to establish a pangenome for you, read the Quickstart and you will find out a fasta file should be added. Also, you can use the Prokka to deal with your fasta files and generate GFF files with the sequences.
Hello, I am new to using these tools, so please excuse me if I don't explain well. I am trying to create a pangenome of Borrelia spp to map my tick microbiome reads against to quantify Borrelia presence in my samples.
I have 11 gff files representing 11 Borrelia species, which I downloaded from NCBI. I have the fasta files too, but I believe the gff format will suffice as input from NCBI, is this correct?
My files are:
GCF_000512145.1_ASM51214v2_genomic.fna.gz GCF_002741785.1_ASM274178v1_genomic.gff.gz
GCF_000512145.1_ASM51214v2_genomic.gff.gz GCF_003606285.1_ASM360628v1_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.fna.gz GCF_003606285.1_ASM360628v1_genomic.gff.gz
GCF_000165595.2_ASM16559v2_genomic.fna.gz GCF_000956315.1_ASM95631v1_genomic.gff.gz GCF_003814405.1_ASM381440v1_genomic.fna.gz
GCF_000165595.2_ASM16559v2_genomic.gff.gz GCF_001936255.1_ASM193625v1_genomic.fna.gz GCF_003814405.1_ASM381440v1_genomic.gff.gz
GCF_000181575.2_ASM18157v2_genomic.fna.gz GCF_001936255.1_ASM193625v1_genomic.gff.gz GCF_014525745.1_ASM1452574v1_genomic.fna.gz
GCF_000181575.2_ASM18157v2_genomic.gff.gz GCF_001936295.1_ASM193629v1_genomic.fna.gz GCF_014525745.1_ASM1452574v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.fna.gz GCF_001936295.1_ASM193629v1_genomic.gff.gz
GCF_000181895.2_ASM18189v2_genomic.gff.gz GCF_002741785.1_ASM274178v1_genomic.fna.gz
Currently, I get a series of errors when I input the following:
Current Behavior
2021-05-07 12:25:21.570015 COMMAND: /home/sean/.local/bin/PEPPAN -p borrelia_files/BORR -t 4 --clust_identity 0.5 --clust_match_prop 0.6 --match_identity 0.4 borrelia_files/GCF_000165595.2_ASM16559v2_genomic.gff.gz borrelia_files/GCF_000181575.2_ASM18157v2_genomic.gff.gz borrelia_files/GCF_000181895.2_ASM18189v2_genomic.gff.gz borrelia_files/GCF_000512145.1_ASM51214v2_genomic.gff.gz borrelia_files/GCF_000956315.1_ASM95631v1_genomic.gff.gz borrelia_files/GCF_001936255.1_ASM193625v1_genomic.gff.gz borrelia_files/GCF_001936295.1_ASM193629v1_genomic.gff.gz borrelia_files/GCF_002741785.1_ASM274178v1_genomic.gff.gz borrelia_files/GCF_003606285.1_ASM360628v1_genomic.gff.gz borrelia_files/GCF_003814405.1_ASM381440v1_genomic.gff.gz borrelia_files/GCF_014525745.1_ASM1452574v1_genomic.gff.gz
2021-05-07 12:25:22.032943 Run MMSeqs linclust to get exemplar sequences. Params: 0.5 identities and 0.8 align ratio
Failed to mmap memory dataSize=0 File=./NS_eyo9ogxk/seq.db_h. Error 22.
Traceback (most recent call last):
File "/home/sean/.local/bin/PEPPAN", line 8, in
sys.exit(ortho())
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1884, in ortho
params['clust'] = iterClust(params['prefix'], params['genes'], groups, dict(identity=params['clust_identity'], coverage=params['clust_match_prop'], n_thread=params['n_thread'], translate=False))
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/PEPPAN.py", line 1784, in iterClust
g, clust = getClust(prefix, g, params)
File "/home/sean/.local/lib/python3.8/site-packages/PEPPAN/modules/clust.py", line 67, in getClust
with open(tabFile) as fin :
FileNotFoundError: [Errno 2] No such file or directory: './NS_eyo9ogxk/clust.tab'
Steps to Reproduce (for bugs)
PEPPAN -p borrelia_files/BORR -t 4 --clust_identity 0.5 --clust_match_prop 0.6 --match_identity 0.4 borrelia_files/*.gff.gz
This does generate some output files with my desired prefix:
BORR.encode.csv,BORR.genes and BORR.old_prediction.npz
Context
I have been searching online for clues and this was my reasoning behind changing the values for cluster identity and clust match prop and match identity. I changed -t to use fewer threads, in case it was a memory issue.
Environment
details of my environment:
To install, I did the following -
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install mmseqs2
conda install blast
conda install diamond
conda install rapidnj
conda install fasttree
command -v mmseqs blastn rapidnj diamond fasttree
/home/sean/miniconda3/envs/peppaninstall/bin/mmseqs
/usr/bin/blastn
pip3 install peppan
I ran the test data and it all worked great.
I hope this makes sense!
The text was updated successfully, but these errors were encountered: