Skip to content

Bring documentation up to date, refactor Python package#99

Merged
jdaymude merged 19 commits intomainfrom
docs-update
Jul 23, 2025
Merged

Bring documentation up to date, refactor Python package#99
jdaymude merged 19 commits intomainfrom
docs-update

Conversation

@jdaymude
Copy link
Contributor

Resolves #66.

  • Rewrites READMEs to be context aware: README.md is specifically for the GitHub repository, README-crate.md is for crates.io, and python/README.md is for PyPI.
  • Documents every public-facing component in the crate, including documentation examples and doctests where appropriate (especially important since a lot of this got left behind after Modularize top-down algorithm components #89).
  • Moves all Python package code and tests from assembly_theory/ to python/ to avoid ImportErrors and general confusion.
  • Aligns the Python interface with the Rust interface (no name conflicts or different naming schemes).
  • Rewrites the Python test suite for better test coverage and error/exception testing.
  • Strips out all custom wrappers for the Python package, consolidating all Python interface maintenance to src/python.rs (functions, modules, signatures, errors, documentation, etc.). This will save us a lot of redundant maintenance. In particular, the docstrings for the #[pyfunction] functions in src/python.rs now double as Python documentation with Python usage examples, eliminating the need to duplicate that documentation in the PyPI README or a future Sphinx/ReadTheDocs solution. However, it comes at the cost of removing RDKit preprocessing (not a big deal, since RDKit can always be used separately at minimal user burden) and timeout interrupts (again, won't be a big deal when our assembly index calculations get fast enough).
  • Fixes inconsistencies between the Rust and Python manifests, making pyproject.toml dynamically load fields from Cargo.toml when possible.

@jdaymude jdaymude requested a review from colemathis July 23, 2025 20:25
@jdaymude jdaymude requested a review from colemathis July 23, 2025 21:23
@colemathis
Copy link
Collaborator

I think this is all great, particularly considering how many more options we have on the algorithm now. I like that we can use the docstrings from the .rs file directly. I tried using python/assembly_theory originally instead of having assembly_theory at the top level but I could not get Maturin to cooperate. So I'm glad to see that's working. All the other improvements seem really helpful for reducing the maintenance effort required to keep things sync'ed up.

My only major concern is the loss of functionality that was in timer.py if performance gets good enough this matters less and less but I wouldn't know how to implement this in an obvious way on the Rust side. @jdaymude you've been through the whole library in detail now. Any thoughts here? It would be ideal to enable the calculation to be exited gracefully after a specified amount of time. Ideally we could also return the lowest index found so far in the process.

@jdaymude
Copy link
Contributor Author

jdaymude commented Jul 23, 2025

@colemathis It should be straightforward(-ish) to add a new --timeout CLI argument to the whole program (not just Python) that kills search after that timeout is reached and outputs the best_index value found at that time. To get a best estimate, it becomes necessary to change index_search directly and not just do something that wraps Python or lives only in src/python.rs. I'm not super familiar with killing threads in Rust, but assuming that's possible to trigger with some kind of asynchronous timer, we can do it.

What is your preference about making this implementation issue blocking for this PR vs. opening this as a new issue? I realize it's technically a regression, but it's also going to take some work to implement.

EDIT: After reading a bit about implementing asynchronous timeouts with tokio::time::timeout, I think we should merge this PR as is and then shift timeout implementation to a new issue (see #100). It's not entirely clear to me at the moment how this is going to interact with parallelism, rayon, making things async-compatible, and so forth, and I'd like these documentation and Python alignment changes in main before figuring all this timeout stuff out.

Copy link
Collaborator

@colemathis colemathis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, will open a new issue about graceful exit/timeout.

@jdaymude jdaymude merged commit 9339716 into main Jul 23, 2025
11 checks passed
@jdaymude jdaymude deleted the docs-update branch July 23, 2025 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up Python virtual environments

3 participants