Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude nodes from the taxonomy #98

Closed
fgvieira opened this issue May 30, 2024 · 4 comments
Closed

Exclude nodes from the taxonomy #98

fgvieira opened this issue May 30, 2024 · 4 comments

Comments

@fgvieira
Copy link

Is it possible to remove a node (or a list of nodes) and (optionally) all those downstream? I am trying to remove all branches of the taxonomy that are unclassified (e.g. 12908).

Something like (when #93 is done):

echo unclassified | taxonkit name2taxid | taxonkit list | taxonkit filter --exclude | taxonkit create-taxdump
@shenwei356
Copy link
Owner

How about directly editing the nodes.dmp file.

  1. get a list of ids with taxonkit list or other ways.
  2. csvtk grep -Ht -f 1 -v -P list.txt nodes.dmp > nodes.new

@shenwei356
Copy link
Owner

How's it going?

@fgvieira
Copy link
Author

Would removing from just the nodes.dmp file work? Don't you need to also edit the names.dmp file?

If not, the I guess I could just do:

echo unclassified | taxonkit name2taxid --fuzzy --fuzzy-top-n 9999999 | cut -f 2 | taxonkit list --indent '' > list.txt
csvtk grep -Ht -f 1 -v -P list.txt nodes.dmp > nodes.new

@shenwei356
Copy link
Owner

Try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants