UNEP-WCMC World Heritage Datasheet

This repository holds the methodology to convert the aging datasheets in DOCX format to markdown, enabling easy publishing based on Pelican.

Methodology

Convert the documents using pandoc. Use python subprocess.call to run commands such as pandoc -s Brazil\ -\ Atlantic\ Forest\ South-East.docx --wrap=none -t Markdown-multiline_tables-simple_tables-pipe_tables-grid_tables -o test.md. Note constructing the command in text requires proper escapes. One could specific

args = ['pandoc', '-s', input_file, '--wrap=none', '-t', 'markdown-multiline_tables-simple_tables-pipe_tables-grid_tables',
	'-o', output_file]
subprocess.call(args)

To ensure the resulting documents could be used by Pelican, it is imperative that minimal metadata is constructed by reading the content. See the script convert-datasheet-pelican-format.py for details
Due to the nature of inconsistent formatting in the original word document, the resulting document in markdown needs cleaning. Notably markdown italic and bold seem to have been messed up.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MD_ready		MD_ready
.gitignore		.gitignore
README.MD		README.MD
convert_datasheet_pelican_format.py		convert_datasheet_pelican_format.py
pandoc_docx_md.py		pandoc_docx_md.py
regx.py		regx.py
run_folder.py		run_folder.py
test_regx.py		test_regx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNEP-WCMC World Heritage Datasheet

Methodology

About

Releases

Packages

Languages

Yichuans/datasheet-format-pelican-md

Folders and files

Latest commit

History

Repository files navigation

UNEP-WCMC World Heritage Datasheet

Methodology

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages