-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent user from using illegal characters in output filename #834
Conversation
Hi, and thanks for your work on this :-) After a quick check on the python docs, I found that by default,
Also, the CI is currently failing because of our linting rules. It asks to apply the following diff (replacing double quotes with simple ones): --- dangerzone/errors.py 2024-06-12 02:21:28.668833+00:00
+++ dangerzone/errors.py 2024-06-12 02:21:57.251937+00:00
@@ -44,11 +44,11 @@
class IllegalOutputFilenameException(DocumentFilenameException):
"""Exception for when the output file contains illegal characters."""
def __init__(self) -> None:
- super().__init__("Filename must not contain the following characters: <>:\"|?*")
+ super().__init__('Filename must not contain the following characters: <>:"|?*')
In case it helps, you can reproduce the linter work by running this command locally: poetry run make lint |
I just realized that I'm wondering if we should take the same approach that's the Django |
After making the linter's suggested changes, it still takes issue with that line, suggesting I change it to be... exactly the same? --- /root/project/dangerzone/errors.py 2024-06-12 15:36:20.384450+00:00
+++ /root/project/dangerzone/errors.py 2024-06-12 15:36:50.243414+00:00
@@ -44,11 +44,11 @@
class IllegalOutputFilenameException(DocumentFilenameException):
"""Exception for when the output file contains illegal characters."""
def __init__(self) -> None:
- super().__init__('Filename must not contain the following characters: <>:"|?*')
+ super().__init__('Filename must not contain the following characters: <>:"|?*') |
That's why I was also uncertain about using it :) Could you clarify what you're thinking about the |
After syncing my branch with the main dev branch, two new test failures are occurring. A conversion error in Bookworm for the
A Dangerzone build error in Focal:
I believe this is unrelated to my additions, but I wanted to point it out. |
- super().__init__('Filename must not contain the following characters: <>:"|?*')
+ super().__init__('Filename must not contain the following characters: <>:"|?*') This is actually fixing the extra space on the top line. You can run the
My thinking was about taking a different way than the one you provided, and rather than displaying an error, replace the "unsafe" chars with safer ones. After some more thinking, I believe the approach you provide here makes more sense, because some of the chars that we would consider "unsafe" can actually meaningful (think accents, for instance). Sorry for bothering you with this 🙄 Let me know when you've had the time to include the linting changes, and we should be good to go! It's possible that the tests failures you're been seeing are now fixed in the main branch. If it's the case, rebasing on top of the latest |
Currently, the commit is throwing an exception on the following test in Windows:
This indicates a challenge with the validation structure. For now, I can remove the colon from the illegal character set to make the code functional, and effectively warn users about other illegal characters, but this will not fully address issue #362 . We need to either use something like |
Hey @bnewc, thanks a lot for you work on this! You know what, a quick fix for the above would be to get just the filename (e.g., with I'm afraid though that even this check has its caveats. Linux filesystems allow the use of virtually every character, and users may already have such files. It would not be nice to error on these files, when they are actually perfectly acceptable. So, my suggestion would be to use a different illegal character set, per OS. For Windows, we can target NTFS, and for macOS we can target HFS+. You can read more here. Finally, note that the main idea of #362 was to not let the user proceed, until the safe extension contains legal characters. So, the best way to tackle this would be to also add check for illegal chars in this method: dangerzone/dangerzone/gui/main_window.py Lines 768 to 780 in e81ecbc
In any case, the PR is in a good direction, so I would consider this part as a good-to-have, only if you have time to work on it. |
Hi @bnewc. We're in the process of releasing 0.7.0, so I wanted to ask if you still have time to work on this issue. If not, that's totally fine, we can continue from where you left off and merge it. Thanks again for all your work 🙂 |
I'll see what I can do! |
The GUI should now inform users of illegal characters in the output safe extension, and prevent conversion until the characters are removed. However, the commit is still failing a number of checks. The linter check issue appears to be related to the Ubuntu Focal workaround for PySide2. I see how to fix that and will do so when I have a bit more time. Additionally, it fails checks that expect an |
Hi @bnewc. Just writing here quick that we're currently in the middle of releasing Dangerzone 0.7.0, and I haven't found the time to look into the latest changes in your PR. Once I manage to find some time though, I'll definitely get back to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some time to test this PR, and check out the test failures. Thanks a lot for updating our GUI checks, I think these should detect 99% of the errors.
I have also added some comments on what to check per OS. Tell me what you think.
dangerzone/gui/main_window.py
Outdated
QRegExValidator = QtGui.QRegExpValidator | ||
self.dot_pdf_validator = QRegExValidator(QRegEx(r".*\.[Pp][Dd][Ff]")) | ||
if platform.system() == "Linux": | ||
illegal_chars_regex = r"[\\]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can check for the /
character (but not the \
one).
(Might be worth having a string constant for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched the regex contents from \\
to /
and the warning widget catches /
on my Linux system. Why might a string constant be preferable?
That all makes sense. I don't have time at the moment, but I should be able to implement the suggestions you made in the next couple of weeks. |
Hi @bnewc, I hope you're doing well. We're currently planning the |
Per my testing, the GUI now works as intended. However, some fault was introduced when I merged your main branch into mine. Now nearly every conversion test fails. The exception tracebacks are not giving me much info, but I'm still investigating. If you want to take a look, something happened between these two commits. |
Awesome, thanks a lot @bnewc. Taking a look right now and will let you know. As for the errors, we are aware of those, and they are tracked in #928. In a nutshell, the latest gVisor release somehow does not work well in a nested container. We have a workaround in the issue to work with the previous release, if you want to test out stuff. |
Thanks for the update, @apyrgio. I'll test out my fixes with the previous gVisor release. |
I can confirm that using the previous gVisor release fixes the issue of failing conversions. |
Gave it one more look. Awesome work @bnewc! I'll make sure to merge this once our CI tests work again. For the time being, I've approved this PR. |
Hm, I wanted to rebase and squash your commits on top of our Anyways, I opened a new PR (#942), where the commits are squashed, and your authorship is retained. Once the tests pass, I'll merge this PR. Cheers! |
This should resolve issue #362 by validating that output filenames contain no illegal characters.
I've used the Python
re
module for validation, and createdIllegalOutputFilenameException
to be raised in the case of an invalid filename. I also updatedtest_document.py
to include a unit test for this new exception type.Filenames are checked against a universal set of illegal characters: <>:"|?*. I believe it's best practice to make the character set system-agnostic, though someone with more knowledge of the security implications could update the feature to derive the character set directly from the host system using
sanitize_filename
from thepathvalidate
library.My branch is open to edits if there are any changes that need to be made.