Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI escape sequences are not interpreted in the message body #1679

Open
johslarsen opened this issue Sep 3, 2024 · 7 comments · May be fixed by #1680
Open

ANSI escape sequences are not interpreted in the message body #1679

johslarsen opened this issue Sep 3, 2024 · 7 comments · May be fixed by #1680

Comments

@johslarsen
Copy link
Contributor

Earlier versions of alot interpreted ANSI escape sequences in the messages bodies for e.g. HTML coloring. This feature was broken by 11ea16f, because it also strips ESC \x1b control characters. Leaving just the coloring commands like e.g. [1;30m. Returning true for \x1b along side \n in the unicode_printable function brings back the old behavior.

Whether that feature was a good idea to begin with is debatable, so if you want it like this I don't mind, but then at least this part of the documentation should be removed. And I am guessing a lot of alot/widgets/ansi.py might be superfluous now

Software Versions

  • Python version: 3.12.5
  • Alot version: 0.11
@pazz
Copy link
Owner

pazz commented Sep 5, 2024

Thanks for reporting this.
To be honest, when I signed off on the merge I did not think of ANSI codes and I think retrospectively it was a mistake to tacitly remove this functionality.
The original idea for interpreting ANSI codes was to allow for interpreting of text/* attachments such as code snippets, via filters such as the one in ranger. I still believe there is some merrit in this although I personally make not much use of this.

I agree that, should we keep the current version instead of reverting that commit, we should also amend the docs as you suggest. I'm in favor of reverting though.

thoughts?

johslarsen added a commit to johslarsen/alot that referenced this issue Sep 5, 2024
In order to bring back support for ANSI colored HTML renderings etc.

Fixes pazz#1679
@johslarsen johslarsen linked a pull request Sep 5, 2024 that will close this issue
@johslarsen
Copy link
Contributor Author

johslarsen commented Sep 5, 2024

I would suggest something like #1680 instead. That is enough to bring back the coloring support, and there are a lot of other characters that could cause problems that the original commits rightfully filtered out

@lfos
Copy link
Contributor

lfos commented Sep 6, 2024

@pazz @johslarsen I personally think that, if brought back, this should be done in a more careful manner: Remove or replace escape sequences from original emails but allow them to be added back by locally run filters.

If we reintroduce parsing of escape sequences in incoming messages and email subjects, maliciously crafted emails can mess with alot in a variety of ways, e.g., have a subject that rewrites the subject of other emails in the buffer, clear the screen and mangle/manipulate all parts of the UI. Depending on the used terminal, even more sinister attacks might be possible [1].

[1] https://unix.stackexchange.com/questions/15101/how-to-avoid-escape-sequence-attacks-in-terminals

johslarsen added a commit to johslarsen/alot that referenced this issue Sep 6, 2024
In order to bring back support for ANSI colored HTML renderings etc.

Fixes pazz#1679
johslarsen added a commit to johslarsen/alot that referenced this issue Sep 6, 2024
In order to bring back support for ANSI colored HTML renderings etc.
Incoming mails are should not be trusted, so escape characters in the
original bodies are stripped away. It is only the ones that were added
during the rendering process that are being sent to the terminal.

Fixes pazz#1679
johslarsen added a commit to johslarsen/alot that referenced this issue Sep 6, 2024
In order to bring back support for ANSI colored HTML renderings etc.
Incoming mails are should not be trusted, so escape characters in the
original bodies are stripped away. It is only the ones that were added
during the rendering process that are being sent to the terminal.

Fixes pazz#1679
@johslarsen
Copy link
Contributor Author

If we reintroduce parsing of escape sequences in incoming messages and email subjects, maliciously crafted emails can mess with alot

I agree. Something like 9a5a0a9 might help a bit. This seems to strip the incoming control characters, and leave the rendered ones. However, stripping stuff before rendering moves the charset conversion from mailcap entries to alot, and I am not sure if that handles corner cases in the same manner. Non-ASCII characters in both ISO-8859-1 and UTF-8 mails seems to render fine on my end at least.

We would also need to limit the string sanitizing to text parts, right? Doing this for binary parts would risk mangling them before they were passed onto mailcap.

@lfos
Copy link
Contributor

lfos commented Sep 8, 2024

(9a5a0a9) might help a bit. This seems to strip the incoming control characters, and leave the rendered ones. However, stripping stuff before rendering moves the charset conversion from mailcap entries to alot, and I am not sure if that handles corner cases in the same manner. Non-ASCII characters in both ISO-8859-1 and UTF-8 mails seems to render fine on my end at least.

We would also need to limit the string sanitizing to text parts, right? Doing this for binary parts would risk mangling them before they were passed onto mailcap.

You're right. Thinking about this again, maybe this is better solved by distinguishing between filters with "safe" outputs and filters with "insecure" outputs. By default, all filters (including no filter) are insecure. Specific filters are marked as safe; the details of how to do so would need to be figured out. Control characters are removed from all insecure filters when rendering (which is essentially the current behavior) and are kept from filters explicitly marked as safe.

That way we'd avoid dealing with all the subleties of incoming messages and their encodings etc., always pass original content to filters, and only allow escape sequences when we know they are produced by filters that are safe. WDYT?

@johslarsen
Copy link
Contributor Author

You're right. Thinking about this again, maybe this is better solved by distinguishing between filters with "safe" outputs and filters with "insecure" outputs.

By filters you are essentially talking about user-specified mailcap entries, right? If so, then users are kind of free to do whatever they like in the filter, and I don't think we should assume that they will do any sort of filtering on their end. From alot's perspective that kind of leave sanitizing before and/or after a filter, and without sanitizing the input I am not so sure that we would have any means of detecting whether control characters were added by the filter or not.

And I don't see how we would do any proper input filtering without dealing with encodings. For instance stripping ESC bytes is safe in most (I guess) 8-bit charsets and UTF-8, but we certainly can't go stripping away any single such bytes in a UTF-16 stream without messing up the rest of the text.

All in all I think this comes down to a trade-off between safe and unmodified inputs. Control characters are tied to terminals displaying the text so safeguarding against those is pretty obvious to focus on at the end after rendering and such. The input to a filter on the other hand would more likely be susceptible to others things like e.g. malformed HTML, and I think we would be better of leaning towards unmodified input than making it safe. Paranoid users are still able to add as much sanitizing as they like, and in general I think this being a text-only mail client makes it a lot safer to begin with than most other clients on the market.

Maybe something like a per content-type (or per mailcap entry or something) opt-in setting for this would be the best approach. Something like strip ESC from rendered output by default, but users could also choose to strip them before filtering instead (like 9a5a0a9) or even not at all if they want to handle as raw input as possible themselves

guludo pushed a commit to guludo/alot that referenced this issue Sep 19, 2024
In order to bring back support for ANSI colored HTML renderings etc.

Fixes pazz#1679

(cherry picked from commit e0321e2)
[Gustavo: Commit fetched from pazz#1680]
@lfos
Copy link
Contributor

lfos commented Sep 25, 2024

Right, something like a per-mailcap entry opt-in setting is exactly what I had in mind. That way, we can distinguish safe from unsafe filters (via users informing us about properties of their filters; sorry for not making that more clear in my previous reply).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants