Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV summary import, square brackets in taxon names #2

Open
mdeleeuw opened this issue Sep 7, 2016 · 0 comments
Open

CSV summary import, square brackets in taxon names #2

mdeleeuw opened this issue Sep 7, 2016 · 0 comments

Comments

@mdeleeuw
Copy link

mdeleeuw commented Sep 7, 2016

I'm trying to import mock community composition in CSV taxonomy summary format into MEGAN CE. Taxon names with square brackets as in "[Ruminococcus] gnavus" get imported at the wrong taxonomic level. I traced the problem down to src/megan/classification/util/MultiWords.java line 72. Because of the "&& (Character.isLetter(line.charAt(i)))" condition, src/megan/classification/IdParser.java starting line 285 tests for "Ruminococcus] gnavus" among other strings, but never for "[Ruminococcus] gnavus". I also had to comment out lines 255-257 of IdParser.java to avoid accounting for the counts at the Order level. The below CSV content reflect the 48 strains of the mock-1 community from

Bokulich, N. A., Rideout, J. R., Mercurio, W. G., Wolfe, B., F, C., Maurice, et al. (2016). mockrobiota: a public resource for microbiome bioinformatics benchmarking. PeerJ, 1–16. http://doi.org/10.7287/peerj.preprints.2065v1
https://github.com/caporaso-lab/mockrobiota

and can be used to reproduce the issue.

Bifidobacterium pseudocatenulatum,1000
Bifidobacterium bifidum,1000
Collinsella intestinalis,1000
Alistipes indistinctus,1000
Bacteroides ovatus,1000
Bacteroides uniformis,1000
Bacteroides cellulosilyticus,1000
Bacteroides thetaiotaomicron VPI-5482,1000
Bacteroides thetaiotaomicron,1000
Bacteroides thetaiotaomicron,1000
Bacteroides vulgatus,1000
Bacteroides xylanisolvens,1000
Bacteroides intestinalis,1000
Bacteroides eggerthii,1000
Bacteroides dorei,1000
Bacteroides finegoldii,1000
Parabacteroides johnsonii,1000
Anaerococcus hydrogenalis,1000
Anaerotruncus colihominis,1000
Blautia luti,1000
Blautia hansenii,1000
Tyzzerella nexilis,1000
Clostridium sp. A2-232,1000
[Clostridium] leptum,1000
[Clostridium] saccharolyticum,1000
[Clostridium] asparagiforme,1000
Hungatella hathewayi,1000
Clostridium sporogenes,1000
Coprococcus comes,1000
Dorea formicigenerans,1000
Dorea longicatena,1000
[Eubacterium] eligens,1000
Eubacterium ventriosum,1000
Holdemanella biformis,1000
Faecalibacterium prausnitzii M21/2,1000
Roseburia intestinalis,1000
[Ruminococcus] gnavus,1000
Ruminococcus lactaris,1000
[Ruminococcus] torques,1000
Streptococcus infantarius,1000
Subdoligranulum variabile,1000
Edwardsiella tarda,1000
Enterobacter cancerogenus,1000
Escherichia coli K12,1000
Escherichia fergusonii,1000
Proteus penneri,1000
Providencia alcalifaciens,1000
Akkermansia muciniphila,1000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant