Functionality to Identify and Assign New Aliases #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Changes
This pull request includes some additional functionality I wrote for identifying and assigning the next available alias code to arbitrary lineages in the course of working on my automated lineage designation pipeline. You might appreciate its addition to your package to assist in your own designation workflows, as well as for other users, though I would understand if you feel this is outside the scope of this particular tool or have concerns about these methods causing confusion for users who are not interested in designating lineages.
In terms of implementation, it works by converting the Pango aliases into base26 numbers, finding the maximum, and incrementing it by 1 to find the next available alias. It handles banned values (I, O, and X) by incrementing the characters past these when returning alias strings. Recombinant lineages (prefixed with X) are tracked as a separate group, but the same functions are available when the appropriate parameter is set.
It includes two new methods and a small number of hidden helper functions:
Additionally, it adds a new parameter to compress(), which when True automatically assigns a new alias string in the case of a fourth suffix level with no accepted alias. The default behavior matches the current behavior (raises an error for unhandled fourth suffix levels).
It's worth noting that I did not write code to automatically export an updated alias_key.json, mostly because information about the alias_key.json is lost on loading as you do not store the multiple recombinant parent lineages, and therefore a JSON rebuilt from the attributes of the Aliasor() object would be incomplete. This could be the subject of a future update.
I have followed the guidelines posted here and here in developing and testing this code. Please let me know if I missed any additional rules I missed, if there are unhandled cases I am not covering, or if you notice any other problems with these changes.
Testing
I've updated the tests with