Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing rules #255

Merged
merged 20 commits into from
Oct 28, 2024
Merged

Handle missing rules #255

merged 20 commits into from
Oct 28, 2024

Conversation

marySalvi
Copy link
Collaborator

@marySalvi marySalvi commented Sep 3, 2024

Closes #236
Closes #237

Backend Changes

  • Created a failed directory and failed manifest file at start of redaction
  • Each failed image is copied to the failed directory and logged to the manifest
  • At the end of the redaction the command needed to re-redact the failed images is logged to the manifest
  • Adds an index option to run command that indicates the starting number for renaming during redaction
  • Originally if an image was skipped, the image counter would increase and thus, that number would be skipped in naming. => Now there is an additional counter that increases on successes.

Screenshot from 2024-10-07 12-27-21

Frontend Changes

  • Adds a confirmation Modal when an image set has images with missing rules
  • Continue forces redaction as described above
  • Cancel closes the modal and allows the user to add a ruleset.
  • A new image plan is generated when a rulest is added/removed

image

@marySalvi marySalvi force-pushed the handle-missing-rules branch 2 times, most recently from 7643f1c to 3c408af Compare September 30, 2024 16:35
@marySalvi marySalvi force-pushed the handle-missing-rules branch 3 times, most recently from 9492d66 to afe6817 Compare October 4, 2024 20:54
@marySalvi marySalvi marked this pull request as ready for review October 4, 2024 20:57
@marySalvi marySalvi force-pushed the handle-missing-rules branch from a9d2d65 to 12a7906 Compare October 7, 2024 16:34
@marySalvi marySalvi requested a review from manthey October 7, 2024 16:39
@manthey
Copy link
Contributor

manthey commented Oct 7, 2024

I was trying to test this. I created a file with an unknown tag (50736). With that, I get a 500 error when the GUI tries to get the plan. Running from the command line, the error is:

  File "imagedephi/redact/svs.py", line 232, in report_plan
    rule = self.metadata_redaction_steps[tag.value]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 50736
[PYI-1288002:ERROR] Failed to execute script '__main__' due to unhandled exception!

Further, if I try to run the run on the command line, my output directory is on a different device than my input directory, so I get

  File "imagedephi/redact/redact.py", line 161, in redact_images
    failed_file.hardlink_to(image_file)
  File "pathlib.py", line 1208, in hardlink_to
OSError: [Errno 18] Invalid cross-device link:

If we can't hardlink the file (because the output is on a different device), we should copy it.

@marySalvi marySalvi force-pushed the handle-missing-rules branch 2 times, most recently from 37bc866 to a573b12 Compare October 7, 2024 18:49
@marySalvi
Copy link
Collaborator Author

I was trying to test this. I created a file with an unknown tag (50736). With that, I get a 500 error when the GUI tries to get the plan. Running from the command line, the error is:

  File "imagedephi/redact/svs.py", line 232, in report_plan
    rule = self.metadata_redaction_steps[tag.value]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 50736
[PYI-1288002:ERROR] Failed to execute script '__main__' due to unhandled exception!

fixed in e861b92

Further, if I try to run the run on the command line, my output directory is on a different device than my input directory, so I get

  File "imagedephi/redact/redact.py", line 161, in redact_images
    failed_file.hardlink_to(image_file)
  File "pathlib.py", line 1208, in hardlink_to
OSError: [Errno 18] Invalid cross-device link:

If we can't hardlink the file (because the output is on a different device), we should copy it.

changed in a573b12

failed_file = failed_dir / image_file.name
# Using copy2 preserves metadata
# https://docs.python.org/3/library/shutil.html#shutil.copy2
copy2(image_file, failed_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to hardlink this and only copy it if we can't?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes that would probably be best

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 5854456

@manthey
Copy link
Contributor

manthey commented Oct 7, 2024

I think it would be better to not create the failed directory or write the failed manifest if there are no failed images.

@manthey
Copy link
Contributor

manthey commented Oct 7, 2024

Should the command to rerun the failed images include the output folder? More generally, if we ran the command from the command line, should the rerun command have all of the options with just the source directory different and the index parameter added?

@marySalvi
Copy link
Collaborator Author

Should the command to rerun the failed images include the output folder? More generally, if we ran the command from the command line, should the rerun command have all of the options with just the source directory different and the index parameter added?

added in 3f60785

@marySalvi
Copy link
Collaborator Author

I think it would be better to not create the failed directory or write the failed manifest if there are no failed images.

changed in 12a8027

@marySalvi marySalvi requested a review from manthey October 10, 2024 18:59
@marySalvi marySalvi force-pushed the handle-missing-rules branch from 275ebcf to be3bb47 Compare October 10, 2024 19:06
)
manifest.write("failed_images_count: " + str(failed_img_counter) + "\n")
index += 1
yaml_command = f"""command: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders oddly and isn't quite a valid command. I get

command: imagedephi run /tmp/a/Failed_2024-10-10_15-24-43\ --output-dir /tmp/a/Redacted_2024-10-10_15-24-43\
  --index 25\

in the yaml file. The first \ makes the input directory contain the following space. Probably we should jam this all on one line rather than try to break it up (it would make it easier to copy to repeart).

Further, this creates a subfolder in the original redaction folder. Either we need a flag to not put redactions in a subfolder or we shouldn't include the final segment of the path in the output directory.

This will also miss other command line options (--profile, --override-rules, --skip-rename, --recursive all probably matter).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pull the --profile and --skip-rename from the original command. I think on this first pass, --recursive would be irrelevant because we are copying the failed images over into the top level of the 'failed directory'.

We wouldn't know what --override-rules would be used. We could include it in the command as --override-rules [path-to-rules] or some such thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the command specified --recursive and there are two files that we can't redact that have the same name but are in different subdirectories, what happens? They would collide in the Failed folder. Rerunning on the failed images won't reconstruct the recursive folder structure if we were using --skip-rename and --recursive.

I think we want to copy whatever --override-rules file was specified. Specifically, I could imagine that I run my set on a bunch of files and it failed on two of them. I then tweak my rules file and run it again with the new-and-improved rule file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll add this and the recursive behavior too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the yaml file. The first \ makes the input directory contain the following space. Probably we should jam this all on one line rather than try to break it up (it would make it easier to copy to repeart).

Further, this creates a subfolder in the original redaction folder. Either we need a flag to not put redactions in a subfolder or we shouldn't include the final segment of the path in the output directory.

fixed in 140a339

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also miss other command line options (--profile, --override-rules, --skip-rename, --recursive all probably matter).

added in 7634ad8

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the command specified --recursive and there are two files that we can't redact that have the same name but are in different subdirectories, what happens? They would collide in the Failed folder. Rerunning on the failed images won't reconstruct the recursive folder structure if we were using --skip-rename and --recursive.

fixed in 9eb13dc

@marySalvi marySalvi requested a review from manthey October 13, 2024 20:49
@manthey
Copy link
Contributor

manthey commented Oct 15, 2024

I ran with

imagedephi -R imagedephi/modify_dates_rules.yaml run /mnt/data3/ImageDePHI/NestedMissing --skip-rename --recursive -o /tmp/a

The NestedMissing folder has two subfolders, with a totla of 26 images split between the two subfolders, two of which have unknown tags.

I got this error:

Traceback (most recent call last):
  File "imagedephi/__main__.py", line 4, in <module>
    main.imagedephi()
  File "click/core.py", line 1130, in __call__
  File "click/core.py", line 1055, in main
  File "imagedephi/utils/cli.py", line 48, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "click/core.py", line 1657, in invoke
  File "click/core.py", line 1404, in invoke
  File "click/core.py", line 760, in invoke
  File "click/decorators.py", line 38, in new_func
  File "imagedephi/main.py", line 151, in run
    redact_images(
  File "imagedephi/redact/redact.py", line 248, in redact_images
    command = yaml.safe_load(yaml_command)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "yaml/__init__.py", line 125, in safe_load
  File "yaml/__init__.py", line 81, in load
  File "yaml/constructor.py", line 49, in get_single_data
  File "yaml/composer.py", line 36, in get_single_node
  File "yaml/composer.py", line 55, in compose_document
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 127, in compose_mapping_node
  File "yaml/parser.py", line 98, in check_event
  File "yaml/parser.py", line 428, in parse_block_mapping_key
  File "yaml/scanner.py", line 116, in check_token
  File "yaml/scanner.py", line 223, in fetch_more_tokens
  File "yaml/scanner.py", line 577, in fetch_value
yaml.scanner.ScannerError: mapping values are not allowed here
  in "<unicode string>", line 1, column 513:
     ... les(associated_images={'default': DeleteRule(key_name='default', ...
                                         ^
[PYI-3676156:ERROR] Failed to execute script '__main__' due to unhandled exception!

@marySalvi
Copy link
Collaborator Author

marySalvi commented Oct 16, 2024

I ran with

imagedephi -R imagedephi/modify_dates_rules.yaml run /mnt/data3/ImageDePHI/NestedMissing --skip-rename --recursive -o /tmp/a
The NestedMissing folder has two subfolders, with a totla of 26 images split between the two subfolders, two of which have unknown tags.

I got this error: ...

Currently the override_rules option parses the rules file and passes the dict to redact_images etc. What we want for the command is the path not the rules themselves.

I changed this behavior in #269 and that should resolve this issue. I've rebased off of that branch so this should be testable now too.

@marySalvi marySalvi force-pushed the handle-missing-rules branch from 7634ad8 to 57cb2f0 Compare October 16, 2024 19:25
@marySalvi marySalvi changed the base branch from main to harden-global-options October 16, 2024 19:26
Base automatically changed from harden-global-options to main October 21, 2024 19:39
@marySalvi marySalvi force-pushed the handle-missing-rules branch from 57cb2f0 to 981ff19 Compare October 23, 2024 16:52
@marySalvi marySalvi force-pushed the handle-missing-rules branch from 981ff19 to bdaedf7 Compare October 23, 2024 18:16
@marySalvi marySalvi merged commit 32f4077 into main Oct 28, 2024
4 checks passed
@marySalvi marySalvi deleted the handle-missing-rules branch October 28, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add missing rules workflow Add override flag for incomplete rulesets
2 participants