Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rich text support in console output #359

Closed
wants to merge 59 commits into from

Conversation

danschwarz
Copy link
Collaborator

@danschwarz danschwarz commented May 26, 2023

This PR is cumulative with PR #356 . It adds the ability to output rich text as Markdown on the console (i.e., in toot timeline output) and also output rich text as Markdown to the clipboard with the Cop[y] command.

It uses the html2text library to convert HTML to Markdown text.

Why Markdown format? Because there doesn't seem to be a html2ansi library available anywhere. Also, ANSI escape sequences aren't a clipboard-friendly format if we want to copy richtext status messages to the clipboard using OSC 52.

Why this library? Because I couldn't find anything better in a prebuilt library. I'm not interested in writing all the code to do text formatting, wrapping, etc. when all that has been done here.

Output of ordinary status messages without HTML passes the console_test.py test suite. I have added an additional test of a status message with all the new HTML tags to be supported in Mastodon 4.2.

That said, html2text is not great! It has some formatting bugs and open PRs piling up since 2022. The last update in Github was February 2022. You can see in the example below, lines with emphasized, bold, and combined styles have extra spaces in them. I have a fork of the project where I've applied all the relevant PRs and it fixes some of the formatting bugs, but not all. (The output you see below uses the release version of html2text, not my fork).

Open to suggestions about how to proceed with this.

image

danschwarz and others added 30 commits March 31, 2023 21:06
Title is now a positional parameter.

Also added some error handling in the command processing
for looking up list IDs per @ihabunek 's suggestions
Also don't check if account is found, that function alredy raises a
ConsoleError.
Top level widgets are separated by blank lines, but
The final blank line of the status is omitted. This exactly
matches existing status rendering in master, for statuses that
contain only the currently supported tags
No longer specifying color "white" when it's more correct to
omit the color and just specify an attribute like underline,
bold, etc.
If the class name appears in the constants.py PALETTE entry, it is
honored. Otherwise, the class is ignored and the tag is handled
as a generic tag of that type.  This allows hashtag anchors
to be highlighted, and URL anchors to be styled differently
regardless of the strange class markup that Akkoma adds to URL
anchors
danschwarz and others added 18 commits April 6, 2023 23:28
Uses the Hyperlink widget along with TextEmbed widget
We see this problem with statuses from Pixelfed servers.
Per the Mastodon API spec, the content tag is supposed to be
HTML, but Pixelfed sends statuses that often start as plain text.
They may include embedded anchor tags etc. within the text.
This confuses BeautifulSoup HTML parsers and results in bad
rendering artifacts.

This workaround detects the above condition and attempts to fix it by
surrounding the status in <p></p>. This converts it to nominally
valid HTML (at least, parseable by BeautifulSoup.)
Also, hashtags are created as OCS-8 hyperlinks, so they are
directly clickable in terminals that support OCS-8
Also, now renders HTML in Account overlay page
Also, removed Python 3.6 tests as urwidgets is Python 3.7+
This introduces a dependency on html2text. The library works
well for our limited use cases, but has not been updated since
2020.
This allows us to pass existing test_console.py tests
This introduces a dependency on html2text. The library works
well for our limited use cases, but has not been updated since
2020.
@danschwarz danschwarz requested a review from ihabunek May 26, 2023 22:23
Pleroma, Akkoma, and other servers do not follow the Mastodon spec
for the 'uri' attribute which specifies that it contains the domain
name of the instance. Instead, they return a complete URI.

As a workaround, we now detect this situation and parse out the
domain from the URI when necessary. This fixes issue ihabunek#347.

Thanks to @laleanor for their patch and @rjp for ideas on how to
make it work with GotoSocial and other servers
Previously this only worked for anchor tags with nested spans.
Now it works for anchor tags with or without nested spans.
@danschwarz
Copy link
Collaborator Author

That said, html2text is not great! It has some formatting bugs and open PRs piling up since 2022.

I've fixed the one HTML2Text issue (design choice) that stopped long URLs from being properly wrapped at the proper width. This is only in my fork of the project for the foreseeable future as upstream is dormant.

The issue with spaces being added in front of underlined text appears to be a deliberate choice (there's a comment stating that in the code). So just something to live with if we go with this library.

@danschwarz
Copy link
Collaborator Author

This is going to need to get squashed before commit, I'm sure. Too much going on here.
I think html2text may be a good candidate for vendoring. I can put together a version of the library that has the appropriate PRs applied, is Python 3.10 compatible, and formats HTML as markdown "well enough".

@danschwarz danschwarz closed this Sep 23, 2023
@danschwarz danschwarz deleted the console-output-enh branch March 10, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants