-
Notifications
You must be signed in to change notification settings - Fork 184
cmd-diff: extract files instead of using FUSE #4360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In the case of `cosa diff --gc` there will be no active differs so let's just move the gc code earlier and conditionalize on active_differs being non-empty for the rest of it.
This approach use more disk space but disk access for the diff will be faster. Files will also survive under after git diff returns in case manual inspection is desired. Co-authored-by: Jean-Baptiste Trystram <jbtrystram@redhat.com>
The diffs contain a bunch of files with sha256 checksums in them, which are unique on every build. Let's attempt to normalize the output directories so the diffs will me less noisy and more meaningful. Also fixup some permissions on some files because they can't be diffed otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the cmd-diff command to extract files from metal images using guestfs and tar instead of a FUSE mount. This is a great simplification that removes complexity and potential fragility associated with multiprocessing and FUSE. The introduction of filename normalization for hashes is also a valuable improvement for generating more meaningful diffs. The overall changes are positive, but I have a couple of suggestions to enhance the robustness of the new implementation.
| # Some files like /etc/shadow and sudo have no read permissions so let's | ||
| # open it up so the difftool can access it. | ||
| runcmd(['find', diff_dir, '-type', 'f', '-perm', '000', | ||
| '-exec', 'chmod', '--verbose', '444', '{}', ';']) | ||
| runcmd(['find', diff_dir, '-type', 'f', '-perm', '111', | ||
| '-exec', 'chmod', '--verbose', '555', '{}', ';']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These find commands to change file permissions are very specific and only handle modes 000 and 111. Other unreadable file modes (e.g., 200 for write-only) won't be fixed. Additionally, using -exec ... ';' is inefficient as it runs chmod for every file. A more robust and efficient approach is to use a single find command to add read permission for the owner to all files, and use -exec ... '+' to process multiple files at once.
# Some files may not have read permissions. Add read permission for the
# owner to all files to ensure that the difftool can access them.
runcmd(['find', diff_dir, '-type', 'f', '-exec', 'chmod', 'u+r', '{}', '+'])|
Thank you for picking that up Dusty ! I'll give it a try today but the change LGTM |
And also try to normalize the output to make the diffs more meaningful.
See individual commit messages.
This obsoletes #4333 (thanks @jbtrystram)