We ran these scripts for automatically finding the missing information. The scripts can check the absence or present of the filtered field in the yml (but it cannot check the quality of the data). The final output after running (1) EJP-catalogue-cleaning.R and (2) EJP-print-issues.R is a list of word documents (.docx) listing the missing information per record, and a CSV table with a summary with the following columns:

n_records: number of yml files inside the folder
description: yml files with no description
license: yml files with no license
time: yml files with no creation and/or update date
publisher: yml files with no institute, publisher, distributor or pointOfContact
no_link: yml files with no link(s) to the resource
broken_link: yml files with broken link(s)
total_issues: sum issues from description to broken_link per folder

Other scripts perform specific checks when necessary during the catalog cleaning process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

READ_ME.md

READ_ME.md

Files

READ_ME.md

Latest commit

History

READ_ME.md

File metadata and controls