ANSI escape sequences are not interpreted in the message body #1679

johslarsen · 2024-09-03T15:53:14Z

Earlier versions of alot interpreted ANSI escape sequences in the messages bodies for e.g. HTML coloring. This feature was broken by 11ea16f, because it also strips ESC \x1b control characters. Leaving just the coloring commands like e.g. [1;30m. Returning true for \x1b along side \n in the unicode_printable function brings back the old behavior.

Whether that feature was a good idea to begin with is debatable, so if you want it like this I don't mind, but then at least this part of the documentation should be removed. And I am guessing a lot of alot/widgets/ansi.py might be superfluous now

Software Versions

Python version: 3.12.5
Alot version: 0.11

The text was updated successfully, but these errors were encountered:

pazz · 2024-09-05T13:07:15Z

Thanks for reporting this.
To be honest, when I signed off on the merge I did not think of ANSI codes and I think retrospectively it was a mistake to tacitly remove this functionality.
The original idea for interpreting ANSI codes was to allow for interpreting of text/* attachments such as code snippets, via filters such as the one in ranger. I still believe there is some merrit in this although I personally make not much use of this.

I agree that, should we keep the current version instead of reverting that commit, we should also amend the docs as you suggest. I'm in favor of reverting though.

thoughts?

In order to bring back support for ANSI colored HTML renderings etc. Fixes pazz#1679

johslarsen · 2024-09-05T15:49:16Z

I would suggest something like #1680 instead. That is enough to bring back the coloring support, and there are a lot of other characters that could cause problems that the original commits rightfully filtered out

lfos · 2024-09-06T16:30:22Z

@pazz @johslarsen I personally think that, if brought back, this should be done in a more careful manner: Remove or replace escape sequences from original emails but allow them to be added back by locally run filters.

If we reintroduce parsing of escape sequences in incoming messages and email subjects, maliciously crafted emails can mess with alot in a variety of ways, e.g., have a subject that rewrites the subject of other emails in the buffer, clear the screen and mangle/manipulate all parts of the UI. Depending on the used terminal, even more sinister attacks might be possible [1].

[1] https://unix.stackexchange.com/questions/15101/how-to-avoid-escape-sequence-attacks-in-terminals

In order to bring back support for ANSI colored HTML renderings etc. Fixes pazz#1679

In order to bring back support for ANSI colored HTML renderings etc. Incoming mails are should not be trusted, so escape characters in the original bodies are stripped away. It is only the ones that were added during the rendering process that are being sent to the terminal. Fixes pazz#1679

johslarsen · 2024-09-06T23:36:44Z

If we reintroduce parsing of escape sequences in incoming messages and email subjects, maliciously crafted emails can mess with alot

I agree. Something like 9a5a0a9 might help a bit. This seems to strip the incoming control characters, and leave the rendered ones. However, stripping stuff before rendering moves the charset conversion from mailcap entries to alot, and I am not sure if that handles corner cases in the same manner. Non-ASCII characters in both ISO-8859-1 and UTF-8 mails seems to render fine on my end at least.

We would also need to limit the string sanitizing to text parts, right? Doing this for binary parts would risk mangling them before they were passed onto mailcap.

lfos · 2024-09-08T16:22:02Z

(9a5a0a9) might help a bit. This seems to strip the incoming control characters, and leave the rendered ones. However, stripping stuff before rendering moves the charset conversion from mailcap entries to alot, and I am not sure if that handles corner cases in the same manner. Non-ASCII characters in both ISO-8859-1 and UTF-8 mails seems to render fine on my end at least.

We would also need to limit the string sanitizing to text parts, right? Doing this for binary parts would risk mangling them before they were passed onto mailcap.

You're right. Thinking about this again, maybe this is better solved by distinguishing between filters with "safe" outputs and filters with "insecure" outputs. By default, all filters (including no filter) are insecure. Specific filters are marked as safe; the details of how to do so would need to be figured out. Control characters are removed from all insecure filters when rendering (which is essentially the current behavior) and are kept from filters explicitly marked as safe.

That way we'd avoid dealing with all the subleties of incoming messages and their encodings etc., always pass original content to filters, and only allow escape sequences when we know they are produced by filters that are safe. WDYT?

johslarsen · 2024-09-08T20:39:08Z

You're right. Thinking about this again, maybe this is better solved by distinguishing between filters with "safe" outputs and filters with "insecure" outputs.

By filters you are essentially talking about user-specified mailcap entries, right? If so, then users are kind of free to do whatever they like in the filter, and I don't think we should assume that they will do any sort of filtering on their end. From alot's perspective that kind of leave sanitizing before and/or after a filter, and without sanitizing the input I am not so sure that we would have any means of detecting whether control characters were added by the filter or not.

And I don't see how we would do any proper input filtering without dealing with encodings. For instance stripping ESC bytes is safe in most (I guess) 8-bit charsets and UTF-8, but we certainly can't go stripping away any single such bytes in a UTF-16 stream without messing up the rest of the text.

All in all I think this comes down to a trade-off between safe and unmodified inputs. Control characters are tied to terminals displaying the text so safeguarding against those is pretty obvious to focus on at the end after rendering and such. The input to a filter on the other hand would more likely be susceptible to others things like e.g. malformed HTML, and I think we would be better of leaning towards unmodified input than making it safe. Paranoid users are still able to add as much sanitizing as they like, and in general I think this being a text-only mail client makes it a lot safer to begin with than most other clients on the market.

Maybe something like a per content-type (or per mailcap entry or something) opt-in setting for this would be the best approach. Something like strip ESC from rendered output by default, but users could also choose to strip them before filtering instead (like 9a5a0a9) or even not at all if they want to handle as raw input as possible themselves

In order to bring back support for ANSI colored HTML renderings etc. Fixes pazz#1679 (cherry picked from commit e0321e2) [Gustavo: Commit fetched from pazz#1680]

lfos · 2024-09-25T14:39:12Z

Right, something like a per-mailcap entry opt-in setting is exactly what I had in mind. That way, we can distinguish safe from unsafe filters (via users informing us about properties of their filters; sorry for not making that more clear in my previous reply).

johslarsen added a commit to johslarsen/alot that referenced this issue Sep 5, 2024

Keep ESC control characters in the message body

68438db

In order to bring back support for ANSI colored HTML renderings etc. Fixes pazz#1679

johslarsen linked a pull request Sep 5, 2024 that will close this issue

Keep ESC control characters in the message body #1680

Open

johslarsen added a commit to johslarsen/alot that referenced this issue Sep 6, 2024

Keep ESC control characters in the message body

e0321e2

In order to bring back support for ANSI colored HTML renderings etc. Fixes pazz#1679

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANSI escape sequences are not interpreted in the message body #1679

ANSI escape sequences are not interpreted in the message body #1679

johslarsen commented Sep 3, 2024

pazz commented Sep 5, 2024

johslarsen commented Sep 5, 2024 •

edited

Loading

lfos commented Sep 6, 2024 •

edited

Loading

johslarsen commented Sep 6, 2024

lfos commented Sep 8, 2024

johslarsen commented Sep 8, 2024

lfos commented Sep 25, 2024

ANSI escape sequences are not interpreted in the message body #1679

ANSI escape sequences are not interpreted in the message body #1679

Comments

johslarsen commented Sep 3, 2024

pazz commented Sep 5, 2024

johslarsen commented Sep 5, 2024 • edited Loading

lfos commented Sep 6, 2024 • edited Loading

johslarsen commented Sep 6, 2024

lfos commented Sep 8, 2024

johslarsen commented Sep 8, 2024

lfos commented Sep 25, 2024

johslarsen commented Sep 5, 2024 •

edited

Loading

lfos commented Sep 6, 2024 •

edited

Loading