The command-line tool yaz-marcdump
can be used for several MARC related tasks.
To validate the structure of "MARC (ISO 2709)" records use the option -n
, which will omit any other output:
$ yaz-marcdump -n loc.mrc
If you want to validate records in other MARC formats you have to specify the format with option -i
:
$ yaz-marcdump -n -i marcxml loc.mrc.xml
If yaz-marcdump
finds any errors it will output an error message:
$ yaz-marcdump -n bad_hathi_records.mrc
<!-- EOF while searching for RS -->
To narrow down the error use option -p
, which will print the record numbers and offsets:
$ yaz-marcdump -np bad_hathi_records.mrc
<!-- Record 1 offset 0 (0x0) -->
<!-- Record 2 offset 1293 (0x50d) -->
<!-- Record 3 offset 5259 (0x148b) -->
<!-- Record 4 offset 6343 (0x18c7) -->
<!-- EOF while searching for RS -->
Common structural problems in MARC records are:
- Invalid leader bytes
$ yaz-marcdump -np bad_leaders_10_11.mrc
<!-- Record 1 offset 0 (0x0) -->
Indicator length at offset 10 should hold a number 1-9. Assuming 2
Identifier length at offset 11 should hold a number 1-9. Assuming 2
Length data entry at offset 20 should hold a number 3-9. Assuming 4
Length starting at offset 21 should hold a number 4-9. Assuming 5
Length implementation at offset 22 should hold a number. Assuming 0
- Record exceeds the maximum length
$ yaz-marcdump -np bad_hathi_records.mrc
<!-- Record 1 offset 0 (0x0) -->
<!-- Record 2 offset 1293 (0x50d) -->
<!-- Record 3 offset 5259 (0x148b) -->
<!-- Record 4 offset 6343 (0x18c7) -->
<!-- EOF while searching for RS -->
- Field exceeds the maximum length
$ yaz-marcdump -np bad_oversize_field_bad_directory.mrc
<!-- Record 1 offset 0 (0x0) -->
<!-- Record 2 offset 1571 (0x623) -->
Directory offset 240: Bad value for data length and/or length starting (0\x1Edrd-348919)
Base address not at end of directory, base 242, end 241
Directory offset 216: Data out of bounds 21398 >= 11833
<!-- Record 3 offset 13404 (0x345c) -->
<!-- Record 4 offset 16133 (0x3f05) -->
<!-- Record 5 offset 18823 (0x4987) -->
- Invalid subfield element
$ yaz-marcdump -np -i marcxml chabon-bad-subfields-element.xml
yaz_marc_read_xml failed
- MARC control character in internal data value
$ yaz-marcdump -np bad_data_value.mrc
<!-- Record 1 offset 0 (0x0) -->
Separator but not at end of field length=37
- Wrong encoded character
$ yaz-marcdump -np bad_encoding.mrc
<!-- Record 1 offset 0 (0x0) -->
No separator at end of field length=53
No separator at end of field length=64
Use xmllint
to validate "MARC XML" data against the MARC XSD schema.
If you just want validate the structure of "MARC XML" records, use the options --noout
(which will omit any other output) and --schema
(path to XSD file):
$ xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd loc.mrc.xml
loc.mrc.xml validates
$ xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd chabon-bad-subfields-element.xml
chabon-bad-subfields-element.xml:8: element subfields: Schemas validity error : Element '{http://www.loc.gov/MARC21/slim}subfields': This element is not expected. Expected is ( {http://www.loc.gov/MARC21/slim}subfield ).
chabon-bad-subfields-element.xml fails to validate
While yaz-marcdump
and xmllint
are useful to identify structural problems within MARC records, marcvalidate
can be used to validate MARC tags and subfields against an Avram specification. The default specification was build by Péter Király based on the MARC documentation of the Library of Congress. The specification can be enhanced with local defined fields.
By default marcvalidate
expects "MARC (ISO 2709)" records:
$ marcvalidate loc.mrc
12360325 906 unknown field
1180649 035 unknown subfield 9
...
To validate "MARC XML" data use option --type
:
$ marcvalidate --type XML loc.mrc.xml
12360325 906 unknown field
1180649 035 unknown subfield 9
...
To validate against a local Avram schema use option --schema
:
$ marcvalidate --schema my_schema.json loc.mrc
If you want to run more detailed analyses check "QA catalogues - a metadata quality assessment tool for MARC records".