Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 473: change how output directory is generated for extract_xri script #507

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

rminsil
Copy link
Collaborator

@rminsil rminsil commented Sep 5, 2024

This PR addresses one point raised in the code review feedback from Damien from this comment:

#491 (review)

I would make the output directory an optional command-line argument. By default, you should extract files to SIL_NLP_ENV.mt_corpora_dir. This should point to the MT/corpora folder in the S3 bucket, which is where we would store this kind of corpus.

I wasn't clear on the directory structure to use within the corpora dir, so I kept the existing logic of generating a unique folder based on the cli inputs + timestamp.

Note that this PR is built off the back of #506. When that one is merged I'll rebase this one and have it target master.

Once this PR is merged, all code review comments from #491 are addressed and balance is returned to the force.


This change is Reviewable

@rminsil rminsil requested a review from ddaspit September 5, 2024 09:33
@rminsil rminsil linked an issue Sep 5, 2024 that may be closed by this pull request
Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @rminsil)


silnlp/common/extract_xri.py line 116 at r1 (raw file):

    if cli_input.output is None:
        unique_dir = f"{cli_input.source_iso}-{cli_input.target_iso}-{cli_input.dataset_descriptor}-{time.strftime('%Y%m%d-%H%M%S')}"
        output_dir = Path(os.path.join(SIL_NLP_ENV.mt_corpora_dir, unique_dir))

SIL_NLP_ENV.mt_corpora_dir is already a Path, so you should be able to do:

output_dir = SIL_NLP_ENV.mt_corpora_dir / unique_dir

Base automatically changed from issue-473-relocate-xri-script to master September 12, 2024 05:56
rminsil pushed a commit that referenced this pull request Sep 12, 2024
rminsil pushed a commit that referenced this pull request Sep 12, 2024
@rminsil
Copy link
Collaborator Author

rminsil commented Sep 12, 2024

SIL_NLP_ENV.mt_corpora_dir is already a Path, so you should be able to do:

output_dir = SIL_NLP_ENV.mt_corpora_dir / unique_dir

Thanks for the tip @ddaspit, there's some newfangled things added to python since I last used it.

I've updated my PR.

@rminsil rminsil force-pushed the issue-473-change-output-location branch from 232cddd to 599eb41 Compare September 12, 2024 06:41
Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @rminsil)

@rminsil rminsil merged commit d269bd4 into master Sep 19, 2024
1 check passed
rminsil pushed a commit that referenced this pull request Sep 19, 2024
rminsil pushed a commit that referenced this pull request Sep 19, 2024
@rminsil rminsil deleted the issue-473-change-output-location branch September 19, 2024 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create initial xri_etl script
2 participants