Skip to content

utils.py modularization#2465

Open
meren wants to merge 35 commits intomasterfrom
utils-py-modularization
Open

utils.py modularization#2465
meren wants to merge 35 commits intomasterfrom
utils-py-modularization

Conversation

@meren
Copy link
Copy Markdown
Member

@meren meren commented Aug 4, 2025

Anvi'o has a problems with its anvio/utils.py. Over the years it has grown to house over 150 functions and classes, and at the time of this PR it was over 5,000 lines-of-code. Having everything in one file is a bad idea, not only for performance, but also for the maintainability of the codebase.

This PR splits anvio/utils.py into multiple modules under the directory anvio/utils/, which include the following modules in it,

  • __init__.py
  • algorithms.py
  • anviohelp.py
  • commandline.py
  • database.py
  • debug.py
  • fasta.py
  • files.py
  • hmm.py
  • misc.py
  • network.py
  • phylogenetics.py
  • sequences.py
  • statistics.py
  • system.py
  • validation.py
  • visualization.py

as defined by the Python file utils_migration_map.py, which also describes which functions are in which modules now. This file is temporarily in the root of the repository, but we will remove it once everything is in master and all major branches are synchronized with this change.

Many small changes in this PR were done manually to set the stage. But the commit 3f5bca5, the one that contains the most comprehensive set of changes that replace all utils imports with proper versions, is produced by the program refactor_utils_changes.py.

Once this PR is merged to master, we will have to go through our major branches, merge master into them, fix conflicts, and once all conflicts are resolved, run the following command to fix the remaining utils imports:

cd ~/github/anvio
python refactor_utils_changes.py anvio

This was a lot of effort, but I think it was really necessary :/

I suggest we wait for the v9 to merge this branch, and do it only after a new release is out.

meren added 4 commits August 5, 2025 11:52
this went through all python files, and updated import statements.

this is a temporary file and I will remove it once our branches are up-to-date if we merge this bracnh to master
@meren
Copy link
Copy Markdown
Member Author

meren commented Sep 4, 2025

Self note: It will be EXTREMELY IMPORTANT to manually carry in every change that is made in utils.py after 2025-08-04 into the new utils modules carefully.

The is necessary due to the following issue: after splitting functions in utils.py into their new resting places within modules in the branch utils-py-modularization, we lost our ability to track changes in utils.py in master through any of the git mechanisms to merge changes or to identify and reporting conflicts.

This means, this task must be done right before merging of this branch to master by manually going through changes line-by-line, and carrying in updated functionality into matching module functions.

For instance, the changes Iva made in the utils.py function run_functional_enrichment_stats on 2025-08-08 (i.e., after 2025-08-04) in master, are not in the same function that is now described in utils/statistics.py:

image

This kind of stuff :)

@ivagljiva
Copy link
Copy Markdown
Contributor

Sorry for making your life harder @meren :)

@meren
Copy link
Copy Markdown
Member Author

meren commented Sep 4, 2025

Oh no, not at all. Better code > Meren's life.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants