Skip to content

Conversation

@jandom
Copy link
Collaborator

@jandom jandom commented Jan 5, 2026

Summary

Looks like I introduced a bug during a refactor (what else is new!) in this PR #77: added some extra structure with the BaseModel but the underlying code expects a dictionary. It'd be cleaned to make everything consume the BaseModel but in the absence of tests, I just use the BaseModel for user-input validation and then dump to a dict.

Changes

  • Fixed the bug
  • Added a smoke test to confirm files are being written out (and are not empty!)

Related Issues

Testing

Other Notes

@jandom jandom requested a review from vinay-swamy January 5, 2026 14:55
@jandom jandom self-assigned this Jan 5, 2026
@jandom jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Jan 5, 2026
@jandom jandom marked this pull request as ready for review January 19, 2026 14:28
@jandom jandom requested a review from jnwei January 19, 2026 14:56
@jandom jandom requested a review from vinay-swamy January 20, 2026 10:45
Copy link
Contributor

@jnwei jnwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some small nits regarding documentation on tets.


# Check that npz files were created for both chains
npz_files = list(tmp_path.glob("*.npz"))
assert len(npz_files) == 6, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a little more context for why 6 is the expected number of npz files? Would it make sense to check for the list of filenames instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I re-organized the tests a bit – by default the script processes the whole directory and that's not ideal (we were mixing loose alignment files with 2 PDB, which were the actual inputs)

return CliRunner()

def test_preparse_databases(self, cli_runner, tmp_path):
"""Test preparsing alignments with a single database (uniref90_hits)."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the the expected input file type (e.g. .sto)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually could be both a3m/sto – I'm hesitant to write documentation for the script in the test :P What's the intent here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reviewing, it isn't obvious to me what are the inputs to this script since as you mention later, we have a combination of inputs and outputs in the testdata/alignments directory.

It would be helpful to leave a signpost, either in the test or in the test_dir, of the expected inputs to help future readers / maintainers.

Another option could be to rearrange the test directory to have the inputs / outputs separated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense now – so this is how this directory is organized

image

The two directories make sense. The loose file directly under alignments/ I have no idea what these are - they're not outputs. In fact, they don't seem to be called by anything. I'm going to do an exploratory 'rm'...

Added the docstring explaining what the script does to these inputs.

Copy link
Contributor

@vinay-swamy vinay-swamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Jan 20, 2026
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Jan 20, 2026
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Jan 21, 2026
@jandom jandom requested a review from jnwei January 21, 2026 09:11
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Jan 21, 2026
@jnwei jnwei added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Jan 22, 2026
@jnwei
Copy link
Contributor

jnwei commented Jan 22, 2026

This should be good to merge after fixing a small spacing typo for directory paths.

@jandom
Copy link
Collaborator Author

jandom commented Jan 22, 2026

@jnwei thanks for the fix, let's merge this bad boy

@jandom jandom merged commit 5c464e5 into main Jan 22, 2026
5 checks passed
@jandom jandom deleted the jandom/2026-01/fix/preparse_alignments_of3 branch January 22, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants