Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: be more graceful when MS tool moves files underneath us #314

Merged
merged 1 commit into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ repos:
# supported by your project here, or alternatively use
# pre-commit's default_language_version, see
# https://pre-commit.com/#top_level-default_language_version
language_version: python3.11
language_version: python3.12
13 changes: 12 additions & 1 deletion cumulus_etl/deid/mstool.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,17 @@ def _compare_file_sizes(target: dict[str, int], current: dict[str, int]) -> floa
return total_current / total_expected


def _get_file_size_safe(path: str) -> int:
try:
return os.path.getsize(path)
except FileNotFoundError:
Copy link

@Dtphelan1 Dtphelan1 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comment being written from an outsider's perspective, so forgive stupid questions plz)
Is this the right Error? The 3.12 docs I was looking at for os.path.getsize suggest it would throw OSError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah as we mentioned on slack, it's a builtin subclass of OSError.

# The MS Tool moves temporary files around as it completes each file,
# so we guard against an unlucky race condition of a file being moved
# before we can query its size. (Total size will be wrong for a moment,
# but it will correct itself in a second.)
return 0


def _count_file_sizes(pattern: str) -> dict[str, int]:
"""Returns all files that match the given pattern and their sizes"""
return {os.path.basename(filename): os.path.getsize(filename) for filename in glob.glob(pattern)}
return {os.path.basename(filename): _get_file_size_safe(filename) for filename in glob.glob(pattern)}
Loading