You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be possible to rewrite JHOVE info, warning and error message texts so that each one is unique?
At the National Archives of the Netherlands, we are processing repository system serverlogs on a monthly basis. This is to find potential data corruption that was not discovered at ingest. A simplistic example is that JHOVE's unexpected end of file warning may not halt an ingest workflow but may be the result of a corrupted file.
Each month these server logs contain many hundred thousands of JHOVE info, warning and error message texts, but not the JHOVE error IDs. In an attempt to reduce the size of the sever log analysis data I am writing scripts to map message texts to JHOVE error IDs. Unfortunately I noticed that several JHOVE error IDs have identical message texts. Some examples are included below.
In our case the repository system is Preservica. From googled examples it also seems that Archivematica presents the error message texts only. FITS seems to also only export the message texts. Other systems may do the same. More people might therefore find it difficult to map messages to JHOVE error IDs or find the correct JHOVE error ID when looking into errors manually or automatically.
If one e.g. encountered an "Invalid name tree" message in a PDF file, there are five identical message entries with different underlying problems (see Wiki). Alternative message texts could have been "Unexpected object type in name tree node" (PDF-HUL-12), "Limits dictionary entry returned null or invalid PDF array" (PDF-HUL-13), "I/O error resolving object for Names, Kids or Limits in name tree node" (PDF-HUL-14), "Name tree node dictionary does not have a Names or Kids entry" (PDF-HUL-15) or "Unexpected error while parsing name tree node" (PDF-HUL-16).
A solution for this problem would also contribute to OPF's open By JHOVE Explain Yourself special interest group. It might make the message texts more specific and understandable (for non-laymen).
--- Examples of identical message texts for different IDs ---
Bad IHDR chunk, aborting/PNG-GDM-31
Bad IHDR chunk, aborting/PNG-GDM-34
Bad cHRM chunk/PNG-GDM-5
Bad cHRM chunk/PNG-GDM-7
Bad gAMA chunk/PNG-GDM-11
Bad gAMA chunk/PNG-GDM-13
Bad gAMA chunk/PNG-GDM-15
Bad gAMA chunk/PNG-GDM-9
Bad iCCP chunk/PNG-GDM-22
Bad iCCP chunk/PNG-GDM-24
Bad sPLT chunk/PNG-GDM-51
Bad sPLT chunk/PNG-GDM-53
Bad sPLT chunk/PNG-GDM-55
Bad sPLT chunk/PNG-GDM-57
Bad zTXt chunk/PNG-GDM-74
Bad zTXt chunk/PNG-GDM-76
Document must have implicit or explicit HEAD element/HTML-HUL-11
Document must have implicit or explicit HEAD element/HTML-HUL-12
Document page tree not found/PDF-HUL-95
Document page tree not found/PDF-HUL-96
Ignored Associated Data chunk of type: .+:/WAVE-HUL-10
Ignored Associated Data chunk of type: .+:/WAVE-HUL-17
Ignored Associated Data chunk of type: .+:/WAVE-HUL-18
Improperly constructed page tree/PDF-HUL-30
Improperly constructed page tree/PDF-HUL-31
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-36
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-38
Invalid Annotations/PDF-HUL-21
Invalid Annotations/PDF-HUL-22
Invalid DateTime digit: .+:/TIFF-HUL-55
Invalid DateTime digit: .+:/TIFF-HUL-56
Invalid ID in trailer/PDF-HUL-77
Invalid ID in trailer/PDF-HUL-78
Invalid ID in trailer/PDF-HUL-79
Invalid Names dictionary/PDF-HUL-89
Invalid Names dictionary/PDF-HUL-90
Invalid character in hex string/PDF-HUL-11
Invalid character in hex string/PDF-HUL-67
Invalid cross-reference table/PDF-HUL-68
Invalid cross-reference table/PDF-HUL-69
Invalid data in document structure root/PDF-HUL-61
Invalid data in document structure root/PDF-HUL-62
Invalid destination object/PDF-HUL-1
Invalid destination object/PDF-HUL-2
Invalid destinations dictionary/PDF-HUL-91
Invalid destinations dictionary/PDF-HUL-92
Invalid dictionary data for page/PDF-HUL-26
Invalid dictionary data for page/PDF-HUL-27
Invalid dictionary data for page/PDF-HUL-28
Invalid document structure root/PDF-HUL-59
Invalid document structure root/PDF-HUL-60
Invalid name tree/PDF-HUL-12
Invalid name tree/PDF-HUL-13
Invalid name tree/PDF-HUL-14
Invalid name tree/PDF-HUL-15
Invalid name tree/PDF-HUL-16
Invalid object definition/PDF-HUL-35
Invalid object definition/PDF-HUL-36
Invalid object definition/PDF-HUL-37
Invalid object definition/PDF-HUL-38
Invalid object number or object stream/PDF-HUL-108
Invalid object number or object stream/PDF-HUL-110
Invalid or ill-formed XMP metadata/PDF-HUL-100
Invalid or ill-formed XMP metadata/PDF-HUL-101
Invalid outline dictionary item/PDF-HUL-125
Invalid outline dictionary item/PDF-HUL-126
Invalid outline dictionary item/PDF-HUL-127
Invalid outline dictionary item/PDF-HUL-130
Invalid outline dictionary item/PDF-HUL-131
Invalid structure attribute/PDF-HUL-51
Invalid structure attribute/PDF-HUL-52
Invalid structure attribute/PDF-HUL-53
Lexical error/PDF-HUL-65
Lexical error/PDF-HUL-66
Malformed cross-reference table/PDF-HUL-82
Malformed cross-reference table/PDF-HUL-83
No TIFF header: .+ .+/TIFF-HUL-20
No TIFF header: .+ .+/TIFF-HUL-22
No TIFF magic number: .+/TIFF-HUL-21
No TIFF magic number: .+/TIFF-HUL-23
No document catalog dictionary/PDF-HUL-85
No document catalog dictionary/PDF-HUL-86
Outlines contain recursive references/PDF-HUL-123
Outlines contain recursive references/PDF-HUL-128
Outlines contain recursive references/PDF-HUL-129
Parsing error/HTML-HUL-2
Parsing error/HTML-HUL-3
Premature EOF/TIFF-HUL-1
Premature EOF:/TIFF-HUL-57
Read error for tag .+/TIFF-HUL-19
Read error for tag .+:/TIFF-HUL-13
Read error for tag .+:/TIFF-HUL-16
Read error for tag .+:/TIFF-HUL-18
SaxParseException: .+:/XML-HUL-1
SaxParseException: .+:/XML-HUL-3
Streams may not be embedded in object streams/PDF-HUL-47
Streams may not be embedded in object streams/PDF-HUL-48
Tag illegal in context/HTML-HUL-4
Tag illegal in context/HTML-HUL-5
Unexpected exception .+/PDF-HUL-102
Unexpected exception .+/PDF-HUL-103
Unexpected exception .+/PDF-HUL-94
Unexpected exception .+/PDF-HUL-99
The text was updated successfully, but these errors were encountered:
RvanVeenendaal
changed the title
Rewrite info, warning and error message texts so that they are all different?
Rewrite info, warning and error message texts so that each one is unique?
Oct 29, 2024
Would it be possible to rewrite JHOVE info, warning and error message texts so that each one is unique?
At the National Archives of the Netherlands, we are processing repository system serverlogs on a monthly basis. This is to find potential data corruption that was not discovered at ingest. A simplistic example is that JHOVE's unexpected end of file warning may not halt an ingest workflow but may be the result of a corrupted file.
Each month these server logs contain many hundred thousands of JHOVE info, warning and error message texts, but not the JHOVE error IDs. In an attempt to reduce the size of the sever log analysis data I am writing scripts to map message texts to JHOVE error IDs. Unfortunately I noticed that several JHOVE error IDs have identical message texts. Some examples are included below.
In our case the repository system is Preservica. From googled examples it also seems that Archivematica presents the error message texts only. FITS seems to also only export the message texts. Other systems may do the same. More people might therefore find it difficult to map messages to JHOVE error IDs or find the correct JHOVE error ID when looking into errors manually or automatically.
If one e.g. encountered an "Invalid name tree" message in a PDF file, there are five identical message entries with different underlying problems (see Wiki). Alternative message texts could have been "Unexpected object type in name tree node" (PDF-HUL-12), "Limits dictionary entry returned null or invalid PDF array" (PDF-HUL-13), "I/O error resolving object for Names, Kids or Limits in name tree node" (PDF-HUL-14), "Name tree node dictionary does not have a Names or Kids entry" (PDF-HUL-15) or "Unexpected error while parsing name tree node" (PDF-HUL-16).
A solution for this problem would also contribute to OPF's open By JHOVE Explain Yourself special interest group. It might make the message texts more specific and understandable (for non-laymen).
--- Examples of identical message texts for different IDs ---
Bad IHDR chunk, aborting/PNG-GDM-31
Bad IHDR chunk, aborting/PNG-GDM-34
Bad cHRM chunk/PNG-GDM-5
Bad cHRM chunk/PNG-GDM-7
Bad gAMA chunk/PNG-GDM-11
Bad gAMA chunk/PNG-GDM-13
Bad gAMA chunk/PNG-GDM-15
Bad gAMA chunk/PNG-GDM-9
Bad iCCP chunk/PNG-GDM-22
Bad iCCP chunk/PNG-GDM-24
Bad sPLT chunk/PNG-GDM-51
Bad sPLT chunk/PNG-GDM-53
Bad sPLT chunk/PNG-GDM-55
Bad sPLT chunk/PNG-GDM-57
Bad zTXt chunk/PNG-GDM-74
Bad zTXt chunk/PNG-GDM-76
Document must have implicit or explicit HEAD element/HTML-HUL-11
Document must have implicit or explicit HEAD element/HTML-HUL-12
Document page tree not found/PDF-HUL-95
Document page tree not found/PDF-HUL-96
Ignored Associated Data chunk of type: .+:/WAVE-HUL-10
Ignored Associated Data chunk of type: .+:/WAVE-HUL-17
Ignored Associated Data chunk of type: .+:/WAVE-HUL-18
Improperly constructed page tree/PDF-HUL-30
Improperly constructed page tree/PDF-HUL-31
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-36
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-38
Invalid Annotations/PDF-HUL-21
Invalid Annotations/PDF-HUL-22
Invalid DateTime digit: .+:/TIFF-HUL-55
Invalid DateTime digit: .+:/TIFF-HUL-56
Invalid ID in trailer/PDF-HUL-77
Invalid ID in trailer/PDF-HUL-78
Invalid ID in trailer/PDF-HUL-79
Invalid Names dictionary/PDF-HUL-89
Invalid Names dictionary/PDF-HUL-90
Invalid character in hex string/PDF-HUL-11
Invalid character in hex string/PDF-HUL-67
Invalid cross-reference table/PDF-HUL-68
Invalid cross-reference table/PDF-HUL-69
Invalid data in document structure root/PDF-HUL-61
Invalid data in document structure root/PDF-HUL-62
Invalid destination object/PDF-HUL-1
Invalid destination object/PDF-HUL-2
Invalid destinations dictionary/PDF-HUL-91
Invalid destinations dictionary/PDF-HUL-92
Invalid dictionary data for page/PDF-HUL-26
Invalid dictionary data for page/PDF-HUL-27
Invalid dictionary data for page/PDF-HUL-28
Invalid document structure root/PDF-HUL-59
Invalid document structure root/PDF-HUL-60
Invalid name tree/PDF-HUL-12
Invalid name tree/PDF-HUL-13
Invalid name tree/PDF-HUL-14
Invalid name tree/PDF-HUL-15
Invalid name tree/PDF-HUL-16
Invalid object definition/PDF-HUL-35
Invalid object definition/PDF-HUL-36
Invalid object definition/PDF-HUL-37
Invalid object definition/PDF-HUL-38
Invalid object number or object stream/PDF-HUL-108
Invalid object number or object stream/PDF-HUL-110
Invalid or ill-formed XMP metadata/PDF-HUL-100
Invalid or ill-formed XMP metadata/PDF-HUL-101
Invalid outline dictionary item/PDF-HUL-125
Invalid outline dictionary item/PDF-HUL-126
Invalid outline dictionary item/PDF-HUL-127
Invalid outline dictionary item/PDF-HUL-130
Invalid outline dictionary item/PDF-HUL-131
Invalid structure attribute/PDF-HUL-51
Invalid structure attribute/PDF-HUL-52
Invalid structure attribute/PDF-HUL-53
Lexical error/PDF-HUL-65
Lexical error/PDF-HUL-66
Malformed cross-reference table/PDF-HUL-82
Malformed cross-reference table/PDF-HUL-83
No TIFF header: .+ .+/TIFF-HUL-20
No TIFF header: .+ .+/TIFF-HUL-22
No TIFF magic number: .+/TIFF-HUL-21
No TIFF magic number: .+/TIFF-HUL-23
No document catalog dictionary/PDF-HUL-85
No document catalog dictionary/PDF-HUL-86
Outlines contain recursive references/PDF-HUL-123
Outlines contain recursive references/PDF-HUL-128
Outlines contain recursive references/PDF-HUL-129
Parsing error/HTML-HUL-2
Parsing error/HTML-HUL-3
Premature EOF/TIFF-HUL-1
Premature EOF:/TIFF-HUL-57
Read error for tag .+/TIFF-HUL-19
Read error for tag .+:/TIFF-HUL-13
Read error for tag .+:/TIFF-HUL-16
Read error for tag .+:/TIFF-HUL-18
SaxParseException: .+:/XML-HUL-1
SaxParseException: .+:/XML-HUL-3
Streams may not be embedded in object streams/PDF-HUL-47
Streams may not be embedded in object streams/PDF-HUL-48
Tag illegal in context/HTML-HUL-4
Tag illegal in context/HTML-HUL-5
Unexpected exception .+/PDF-HUL-102
Unexpected exception .+/PDF-HUL-103
Unexpected exception .+/PDF-HUL-94
Unexpected exception .+/PDF-HUL-99
The text was updated successfully, but these errors were encountered: