Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HTML definition lists (<dl>, <dt>, and <dd>) #173

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

chrispy-snps
Copy link
Collaborator

@chrispy-snps chrispy-snps commented Dec 31, 2024

Fixes #172.

New convert_dt() and convert_dd() functions are added that follow the PHP Markdown Extra syntax:

https://michelf.ca/projects/php-markdown/extra/#def-list

If additional definition list dialects are requested in the future, a configuration option can be added to select the format.

No convert_dl() function is added; the child-tag conversion functions do all the work.

The regression tests are updated to test various structures. I also used Pandoc to confirm that all Markdownify results are converted back to the expected HTML source.

Note: This pull request requires that #171 be merged first; otherwise the test_dl unit test will fail.

Limitations

There are two limitations in this support, both related to the fact that blank lines are added outside the convert_dt() and convert_dd() function scopes.

Limitation 1 - multiple terms sharing the same definition are not handled properly (the term lines are separated by a blank line instead of kept directly adjacent):

<dl>
  <dt>term 1a</dt>
  <dt>term 1b</dt>
  <dd>definition</dd>
</dl>

Limitation 2 - a blank line is always inserted before definitions, causing them to signify paragraph-based definitions even when they were not:

<dl>
  <dt>term 1</dt>
  <dd>bare definition</dd>
  <dt>term 2</dt>
  <dd><p>definition in paragraph</p></dd>
</dl>

Signed-off-by: chrispy <chrispy@synopsys.com>
@ninsbl
Copy link

ninsbl commented Jan 8, 2025

I just tested the feature locally as I am interested in the functionality for markdown conversion. It is great to see this becoming available, I noticed that the result is one big text paragraph. I think an additional newline would be needed after the <dd> elements (for example).

@chrispy-snps
Copy link
Collaborator Author

@ninsbl - can you share a small testcase here for me to reproduce it?

@ninsbl
Copy link

ninsbl commented Jan 8, 2025

Sure. We are currently looking into moving documentation for GRASS GIS from HTML to Markdown.

One manual page with <dl>, <dt> and <dd> is for example: https://grass.osgeo.org/grass84/manuals/grass.html

Here is the code used for translation to Markdown (with python-markdownify installed for the branch)

import bs4
import requests

from markdownify import markdownify, MarkdownConverter

resp = requests.get("https://grass.osgeo.org/grass84/manuals/grass.html")
soup = bs4.BeautifulSoup(resp.text, 'html5lib')
MarkdownConverter(
    **{
        "heading_style": "atx",
        "escape_misc": True,
        "code_language": "shell",
        "newline_style": "backslash",
        "wrap_width": 79
    }
).convert_soup(soup)

You can see resulting markdown here: https://github.com/ninsbl/grass/edit/md_test/md/grass.md. Please look at the "FLAGS" section...

Note the line break before the : (which becomes a white space) and only one linebreak after the <dd> element becoming space as well...

Also, element content seems only parially wrapped...

@chrispy-snps
Copy link
Collaborator Author

@ninsbl - here is the "Flags" section from the Markdown file you referenced:

### Flags:

**-h** \| **-help** \| **--help**
:   Prints a brief usage message and exits
**-v** \| **--version**
:   Prints the version of GRASS and exits
**-c XY**
:   Creates new GRASS project (location) without coordinate reference system in specified GISDBASE
**-c geofile**
:   Creates new GRASS project in specified GISDBASE with coordinate reference system based on georeferenced file
**-c EPSG:code**
:   Creates new GRASS project in specified GISDBASE with coordinate reference system defined by EPSG code
**-c EPSG:code:datum\_trans**
:   Creates new GRASS project in specified GISDBASE with coordinate reference system defined by EPSG code and datum transform parameters
**-e**
:   Exit after creation of project or mapset. Only with **-c** flag
**-f**
:   Forces removal of .gislock if exists (use with care!). Only with --text flag
**--text**
:   Indicates that Text-based User Interface should be used (skip welcome screen)
**--gtext**
:   Indicates that Text-based User Interface should be used (show welcome screen)
**--gui**
:   Indicates that Graphical User Interface
    (*[wxGUI](wxGUI.html)*) should be used
**--config**
:   Prints GRASS configuration parameters (options: arch, build, compiler, date, path, python\_path, revision, svn\_revision, version)
**--exec EXECUTABLE**
:   Execute GRASS module or script. The provided executable will be executed in a GRASS GIS non-interactive session.
**--tmp-project**
:   Run using a temporary project which is created based on the given
    coordinate reference system and deleted at the end of the execution
    (use with the --exec flag).
    The active mapset will be the PERMANENT mapset.
**--tmp-mapset**
:   Run using a temporary mapset which is created in the specified
    project and deleted at the end of the execution
    (use with the --exec flag).

To me, it looks like the code in the pull request is functioning properly. I am able to convert this Markdown back to HTML with Pandoc, and the definition list structures are converted correctly back to HTML.

I do see the wrapping behavior you mention, but that exists with and without this pull request and would be something to investigate separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support HTML definition lists (<dl>, <dt>, and <dd>)
3 participants