-
-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Conversion of HTML manual pages to markdown fails for HTML figure code #4864
Comments
Or, if we want to keep things moving, add an exclusion for now. Is there a pattern that could be used or it would be impossible? It's ok to not have them perfect on the first try. |
Test submission of conversion of all HTML manual pages to markdown using the `pandoc` based converter script (see OSGeo#4620). For figure code conversion issues, see OSGeo#4864
For easier inspection, converted MD files submitted in #4865. |
Maybe this python library by Microsoft could be worth a try: https://github.com/microsoft/markitdown ? |
I didn't know about this one :) |
I just tried the markitdown tool on v.fill.holes.html And the result looks quite OK. Images are bigger compared to the pandoc conversion. However, pymarkdownlnt and markdownlint-cli for example complain about line length and missing blank lines (amongst others)... Also code blocks are not automatically defined as shell... So, there some post-processing would be needed too... |
I tried it as well, but no success with e.g. this file:
What's the trick, @ninsbl ? |
@ninsbl would you mind to share the command you have used? |
OK, now I tested and managed to convert all HTML manual pages into markdown with markitdown. Here is what I did from the root of the GRASS GIS source tree, writing the mardown files into a directory named md:
grep is used to exclude the compilation directory: dist.x86_64-pc-linux-gnu. Some HTML files (e.g.: test.rtree.lib.) are empty and cause markitdown to fail, so I had to handle them separate ( And here is a summary of the warnings/errors that pymarkdownlint reports on the generated .md files:
consecutive runs of
However, fixing with pymarkdown can introduce new issues and sometimes rules cause conflicts in the tool... I did not check how those files look... This is my setup:
|
Just to note that two (or more, but markdown lint has rules for that) has a meaning in markdown, it means to add a line break, to not wrap in the same paragraph. |
Thanks for the note @echoix . Now I noticed, pymarkdownlint has a Do you have a suggestion for another markdown linter? pymarkdownlint (which I used here) changes files forth and back which seems a little unstable... |
Thanks @ninsbl! I have taken diff --side-by-side --width=150 r3.to.rast.md ~/software/grass_main/raster3d/r3.to.rast/r3.to.rast.md
<
## DESCRIPTION ## DESCRIPTION
Converts one 3D raster map into several 2D raster maps (depends on dep | Converts one 3D raster map into several 2D raster maps (depends on
If the 2D and 3D region settings are different, the 3D resolution will | depths). If the 2D and 3D region settings are different, the 3D
adjusted to the 2D resolution (the depths are not touched). | resolution will be adjusted to the 2D resolution (the depths are not
The user can force *r3.to.rast* to use the 2D resolution of the input | touched). The user can force *r3.to.rast* to use the 2D resolution of
3D raster map for the output maps, independently from the current regi | the input 3D raster map for the output maps, independently from the
![](r3.to.rast.png) | current region settings.
>
> <img src="r3.to.rast.png" data-border="0" />
> | |
> |------------------------|
| *How r3.to.rast works* | | *How r3.to.rast works* |
| --- | <
### Map type conversions ### Map type conversions
Type of resulting 2D raster maps is determined by the type of the | Type of resulting 2D raster maps is determined by the type of the inpu
input 3D raster, i.e. 3D raster of type DCELL (double) will result in | 3D raster, i.e. 3D raster of type DCELL (double) will result in DCELL
DCELL 2D rasters. A specific type for 2D rasters can be requested usin | rasters. A specific type for 2D rasters can be requested using the
the **type** option. | **type** option.
|
The **type** option is especially advantageous when the 3D raster | The **type** option is especially advantageous when the 3D raster map
map stores categories (which need to be stored as floating point numbe | stores categories (which need to be stored as floating point numbers)
and the 2D raster map should be also categorical, i.e. use integers. | and the 2D raster map should be also categorical, i.e. use integers. T
The type is set to `CELL` in this case. | type is set to `CELL` in this case.
### Modifying the values <
The values in the 3D raster map can be modified prior to storing in | ### Modifying the values
the 2D raster map. The values can be scaled using the option **multipl <
and a constant value can be added using the option **add**. <
The new value is computed using the following equation: <
``` | The values in the 3D raster map can be modified prior to storing in th
> 2D raster map. The values can be scaled using the option **multiply**
> and a constant value can be added using the option **add**. The new
> value is computed using the following equation:
> ```bash
y = ax + b y = ax + b
<
``` ```
where *x* is the original value, *a* is the value of | where *x* is the original value, *a* is the value of **multiply**
**multiply** option, *b* is the value of **add** option, | option, *b* is the value of **add** option, and *y* is the new value.
and *y* is the new value. When **multiply** is not provided, | When **multiply** is not provided, the value of *a* is 1. When **add**
the value of *a* is 1. When **add** is not provided, the value | is not provided, the value of *b* is 0.
of *b* is 0. |
## NOTES ## NOTES
Every slice of the 3D raster map is copied to one 2D raster map. The m | Every slice of the 3D raster map is copied to one 2D raster map. The
are named like **output***\_slicenumber*. Slices are counted from bott | maps are named like **output***\_slicenumber*. Slices are counted from
to the top, so the bottom slice has number 1. | bottom to the top, so the bottom slice has number 1.
The number of slices is equal to the number of depths. The number of slices is equal to the number of depths.
To round floating point values to integers when using `type=CELL`, | To round floating point values to integers when using `type=CELL`, the
the **add** option should be set to 0.5. | **add** option should be set to 0.5.
>
## SEE ALSO ## SEE ALSO
*[r3.cross.rast](r3.cross.rast.html), | *[r3.cross.rast](r3.cross.rast.md), [r3.out.vtk](r3.out.vtk.md),
[r3.out.vtk](r3.out.vtk.html), | [r3.out.ascii](r3.out.ascii.md), [g.region](g.region.md)*
[r3.out.ascii](r3.out.ascii.html), <
[g.region](g.region.html)* <
## AUTHORS <
Sören Gebbert | ## AUTHORS
Vaclav Petras, [NCSU GeoForAll Lab](https://geospatial.ncsu.edu/geofor | Sören Gebbert
> Vaclav Petras, [NCSU GeoForAll
> Lab](https://geospatial.ncsu.edu/geoforall/) Differences:
Here the MD files for easier local comparison with e.g. |
You can visually inspect the results of the conversion here: https://github.com/ninsbl/grass/blob/md_test/md For selected manuals I added the image files: Language for fenced code (assuming all is shell / sh) and linebreaks above headings could be addressed with a little script I guess. Yet, some errors I guess require manual adjustment, like some of the inline-html (not sure if this should be done in the HTML files then)?... @neteler what issues did you observe with the figure conversion? And last but not least how should we proceed? If we manage to fix
with a script and if we ignore
and maybe
The remaining issues are not overwhelmingly many: |
Also markitdown does not convert |
BTW: markitdown uses markdownify under the hood, with some adjustments: class _CustomMarkdownify(markdownify.MarkdownConverter): Maybe something worth considering? |
[...]
Indeed, the |
In Megalinter, we have: https://megalinter.io/latest/descriptors/markdown/
Try the first two? Did you also try pymarkdown? Also, you talked about links somewhere below. What I've observed is mostly to have the links be correct in the repo, with relative links referring to other markdown files like they do in the repo, and the build tool that's creates the HTML website adjusts the links accordingly when they process it. But some places, I think that the Microsoft docs, don't necessarily use that. |
And what's the big deal of having the figures look right by using html inside markdown as a first iteration? |
My understanding was that figures did not look to well with inline HTML... Also, conversion of dt elements seems to be an issue for files like grass.html; compare that to: https://github.com/ninsbl/grass/blob/md_test/md/grass.md where I had to use a custom version og markdownify to achieve something similar. In general, I would suggest to use markdownify directly instead of the markitdown wrapper, which gives us fewer options... |
Describe the bug
I am working on the mass conversion of all HTML manual pages to markdown. To convert all HTML files to markdown I have written a
pandoc
based converter script (see #4620) which already does most of the job.A showstopper in the conversion of HTML manual pages to markdown are the figures as the related HTML snippets vary from manual page to manual page, nonetheless there is a style recommendation.
For an easier discussion, I have moved the figure issue here to separate it out from #4748.
Many figures looks ugly after MD conversion (resulting MD code is paertially garbage):
grass/vector/v.fill.holes/v.fill.holes.html
Line 13 in fc94e29
mkdocs/site/raster3dintro.html
I have written a LUA filter for
pandoc
(yet unsubmitted) but it can only convert that specific HTML code. With so many HTML variants I have no idea how to do that.To reproduce
utils/grass_html2md.sh
converter script (see docs: script to convert HTML manual pages to markdown #4620)markdownlint
on the MD filesI tried to submit the converted MD files for community review but I get stuck in the
pre-commit
stage:From my terminal:
Expected behavior
I wonder if we have to touch the ~170 HTML files manually to streamline the HTML figure code therein in order to eventually develop a single
pandoc
LUA filer.Support welcome!
The text was updated successfully, but these errors were encountered: