Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: HTML to Markdown lua filters #5054

Merged
merged 12 commits into from
Feb 7, 2025

Conversation

cwhite911
Copy link
Contributor

This PR is a continuation of #4620.

It's adding additional pandoc lua filters to address remaining issues caught by the markdownlint-cli.

Markdownlint Rules Docs: https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md
Pandoc Lua Filters Docs: https://pandoc.org/lua-filters.html#module-pandoc.utils

@petrasovaa
Copy link
Contributor

petrasovaa commented Feb 6, 2025

This doesn't seem to address the problem with figures (#4864).
Also d.graph seemed to get broken:

## COMMANDS

The graphics language is simple, and uses the following commands:

\[ [\#](#comment) | [move](#move) | [draw](#draw) | [polygon](#polygon) |
[polyline](#polyline) | [color](#color) | [text](#text) | [size](#size) |
[symbol](#symbol) | [rotation](#rotation) | [icon](#icon) | [width](#width) \]

- # comment A line of comment which is ignored in the processing.
- move xpos ypos The current location is updated to xpos ypos. Unless the -m flag is used, values are stated as a
-percent of the active display frame’s horizontal (xpos) and vertical (ypos) size, and may be floating point values.
-Values are between 0-100. Note. A space must separate xpos and ypos.
- draw xpos ypos A line is drawn in the current color from the current location to the new location xpos ypos, which
-then becomes the current location. Unless the -m flag is used, values are stated as a percent of the active display
-frame’s horizontal (xpos) and vertical (ypos) size, and may be floating point values. Values are between 0-100. Note.
-A space must separate xpos and ypos.

Original:

\[ [\#](#comment) | [move](#move) | [draw](#draw) |
[polygon](#polygon) | [polyline](#polyline) | [color](#color) |
[text](#text) | [size](#size) | [symbol](#symbol) |
[rotation](#rotation) | [icon](#icon) | [width](#width) \]

  - <span id="comment"></span>**\#** *comment*  
    A line of comment which is ignored in the processing.
  - <span id="move"></span>**move** *xpos ypos*  
    The current location is updated to *xpos ypos*. Unless the **-m**
    flag is used, values are stated as a percent of the active display
    frame's horizontal (*xpos*) and vertical (*ypos*) size, and may be
    floating point values. Values are between 0-100. **Note.** A space
    must separate *xpos* and *ypos*.

@wenzeslaus
Copy link
Member

What is the diff between files generated from main and files generated from this branch? Use of Meld or diff should point out issues like the one with d.graph.

@cwhite911
Copy link
Contributor Author

cwhite911 commented Feb 6, 2025

I've updated the filters. However, we still have issues with some html element types. The two major ones are

  • <table> is kind of hit or miss if it works correctly. (i.atcorr, r.slope.aspect)
  • <dl> description lists will also most likely need to be fixed manually. (d.

We also have some markdownlint warnings we can choose to address or ignore.

We also are using some unsupported HTML elements and attributes that we can remove.

  • <center>
  • align="center"

@cwhite911 cwhite911 marked this pull request as ready for review February 7, 2025 00:48
@wenzeslaus
Copy link
Member

I merged the main branch here, bringing the HTML .md files. There was a conflict in the linting rules. I removed those here. They are refined for the converted files, but this is not converting them yet, so I'm just keeping the version from main. The config will be used later and for convenience is here:

line-length:
  code_blocks: false
  tables: false
  line_length: 120

no-inline-html:
  allowed_elements: [sup]

@wenzeslaus
Copy link
Member

I'm comparing results if this and of what is on main. I see that some images come out differently. Specifically:

- [<img src="g_gui_gmodeler_loop_dlg.png" width="300" />](g_gui_gmodeler_loop_dlg.png) 
+ [](g_gui_gmodeler_loop_dlg.png)
- <div align="center">
-
-[<img src="r_drain_with_r_watershed_direction.png" width="300"
-height="280" alt="drainage using r.watershed" />](r_drain_with_r_watershed_direction.png)  
-*Figure: Drainage paths from two points where directions from
-r.watershed were used*
-
- </div>
+<div align="center">
+<a href="r_drain_with_r_watershed_direction.png"><img src="r_drain_with_r_watershed_direction.png" alt="drainage using r.watershed" width="300" height="280"></a>
+<br>
+<i>Figure: Drainage paths from two points where directions from
+r.watershed were used</i>
+</div>
-[<img src="r_fill_stats_smoothing.png" width="600" height="300"
-alt="Smooth versus preserve" />](r_fill_stats_smoothing.png)
+[Smooth versus preserve](r_fill_stats_smoothing.png)

Neither of them is perfect, so perhaps we can just fix them manually, but some of the later one in the list above are worse.

Some of the links are worse:

-<https://gdal.org/en/stable/drivers/raster/>
+[https://gdal.org/en/stable/drivers/raster/](https://gdal.org/en/stable/drivers/raster/)

Maybe this is for a post processing with regexp?

Links are not broken over multiple lines with the new version, so that seems like a plus.

Code blocks are sh instead of shell which lints better.

@wenzeslaus
Copy link
Member

This removed for me an error from mkdocs output. Strangely, I have not seen it before, but anyway, I see it now without this conversion update and I don't see it with it.

Error in wrapping img tag with anchor tag: expected string or bytes-like object, got 'NoneType' <a
href="r_carve_dem_orig_shaded.png"><img
src="r_carve_dem_orig_shaded.png" data-border="0" width="300"
height="321" alt="r.carve example: original DEM shaded" /></a>
Error in wrapping img tag with anchor tag: expected string or bytes-like object, got 'NoneType' <a
href="r_carve_dem_carved_shaded.png"><img
src="r_carve_dem_carved_shaded.png" data-border="0" width="300"
height="321" alt="r.carve example: carved DEM shaded" /></a>
Error in wrapping img tag with anchor tag: expected string or bytes-like object, got 'NoneType' <a
href="r_carve_dem_orig_accum.png"><img src="r_carve_dem_orig_accum.png"
data-border="0" width="300" height="321"
alt="r.carve example: original DEM flow accumulated" /></a>
Error in wrapping img tag with anchor tag: expected string or bytes-like object, got 'NoneType' <a
href="r_carve_dem_carved_accum.png"><img
src="r_carve_dem_carved_accum.png" data-border="0" width="300"
height="321"
alt="r.carve example: carved DEM flow accumulation" /></a>

@wenzeslaus
Copy link
Member

As for the images, it really does make some images worse in the final mkdocs output, but maybe it does not matter, because manual fixes are needed anyway, both for v.fill.holes and r.carve.

Before

Screenshot from 2025-02-07 11-42-28
Screenshot from 2025-02-07 11-42-15

After

Screenshot from 2025-02-07 11-35-52
Screenshot from 2025-02-07 11-35-31

@cwhite911
Copy link
Contributor Author

Here is the .markdownlint.yml I used to run markdownlint (v0.44.0)

markdownlint **/**/*.md --ignore node_modules --ignore venv --fix
  default: true
  
  # Fix any fixable errors (depending on the markdownlint wrapper tool used)
  fix: true
  
  MD041: false # first-line-h1
  line-length:
    code_blocks: false
    tables: false
    # line_length: 120
    line_length: 300
    # imagery/i.atcorr/i.atcorr.md:285:121 MD013/line-length Line length [Expected: 120; Actual: 294]
  
  
  # HTML is coming over from HTML table elements
  no-inline-html:
    allowed_elements: [sup, sub, table, span, p, br, img, colgroup, col, tbody, tr, td, img, em, strong, thead, th]
  
  ul-indent:
    start_indented: true
    indent: 2
  
  no-trailing-punctuation: false
  
  # Fix v.transform, v.dssovle, r3.in.ascii, i.vi, d.mon
  no-duplicate-heading: false 
  
  # Fix v.net.iso, v.label
  heading-increment: false
  
  no-emphasis-as-heading: false
  
  # Fix v.random, r.surf.idw
  link-fragments: false

@cwhite911
Copy link
Contributor Author

@wenzeslaus try it now.

@cwhite911 cwhite911 requested a review from wenzeslaus February 7, 2025 17:17
Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This at the end only changes shell to sh and removes the div tags (which is a nice cleanup) in comparison to the original. For that, this is too much code. However, I want to proceed with this as we can't clearly get any further with automatic conversion. The Markdown files in #5064 are based on this script with additional changes and it seems to work on an acceptable level.

I didn't used the markdownlint config from here, but went with a simpler version with some additional ignores.

@wenzeslaus wenzeslaus enabled auto-merge (squash) February 7, 2025 19:30
@wenzeslaus wenzeslaus merged commit bd2ebf3 into OSGeo:main Feb 7, 2025
28 checks passed
@github-actions github-actions bot added this to the 8.5.0 milestone Feb 7, 2025
wenzeslaus added a commit that referenced this pull request Feb 8, 2025
This converts HTML markup in .md files to Markdown using the ./utils/grass_html2md.sh script.

The conversion was done using the #5054 version of the script with additional (reviewed) fixes by markdownlint-fix from pre-commit.

Remove .md files which should have stayed .html. This breaks history but not terribly and the history is not that important for these two (one is a recently added mkdocs template and the other may be even unused).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants