Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite info, warning and error message texts so that each one is unique? #970

Open
RvanVeenendaal opened this issue Oct 29, 2024 · 0 comments

Comments

@RvanVeenendaal
Copy link

RvanVeenendaal commented Oct 29, 2024

Would it be possible to rewrite JHOVE info, warning and error message texts so that each one is unique?

At the National Archives of the Netherlands, we are processing repository system serverlogs on a monthly basis. This is to find potential data corruption that was not discovered at ingest. A simplistic example is that JHOVE's unexpected end of file warning may not halt an ingest workflow but may be the result of a corrupted file.

Each month these server logs contain many hundred thousands of JHOVE info, warning and error message texts, but not the JHOVE error IDs. In an attempt to reduce the size of the sever log analysis data I am writing scripts to map message texts to JHOVE error IDs. Unfortunately I noticed that several JHOVE error IDs have identical message texts. Some examples are included below.

In our case the repository system is Preservica. From googled examples it also seems that Archivematica presents the error message texts only. FITS seems to also only export the message texts. Other systems may do the same. More people might therefore find it difficult to map messages to JHOVE error IDs or find the correct JHOVE error ID when looking into errors manually or automatically.

If one e.g. encountered an "Invalid name tree" message in a PDF file, there are five identical message entries with different underlying problems (see Wiki). Alternative message texts could have been "Unexpected object type in name tree node" (PDF-HUL-12), "Limits dictionary entry returned null or invalid PDF array" (PDF-HUL-13), "I/O error resolving object for Names, Kids or Limits in name tree node" (PDF-HUL-14), "Name tree node dictionary does not have a Names or Kids entry" (PDF-HUL-15) or "Unexpected error while parsing name tree node" (PDF-HUL-16).

A solution for this problem would also contribute to OPF's open By JHOVE Explain Yourself special interest group. It might make the message texts more specific and understandable (for non-laymen).

--- Examples of identical message texts for different IDs ---

Bad IHDR chunk, aborting/PNG-GDM-31
Bad IHDR chunk, aborting/PNG-GDM-34
Bad cHRM chunk/PNG-GDM-5
Bad cHRM chunk/PNG-GDM-7
Bad gAMA chunk/PNG-GDM-11
Bad gAMA chunk/PNG-GDM-13
Bad gAMA chunk/PNG-GDM-15
Bad gAMA chunk/PNG-GDM-9
Bad iCCP chunk/PNG-GDM-22
Bad iCCP chunk/PNG-GDM-24
Bad sPLT chunk/PNG-GDM-51
Bad sPLT chunk/PNG-GDM-53
Bad sPLT chunk/PNG-GDM-55
Bad sPLT chunk/PNG-GDM-57
Bad zTXt chunk/PNG-GDM-74
Bad zTXt chunk/PNG-GDM-76
Document must have implicit or explicit HEAD element/HTML-HUL-11
Document must have implicit or explicit HEAD element/HTML-HUL-12
Document page tree not found/PDF-HUL-95
Document page tree not found/PDF-HUL-96
Ignored Associated Data chunk of type: .+:/WAVE-HUL-10
Ignored Associated Data chunk of type: .+:/WAVE-HUL-17
Ignored Associated Data chunk of type: .+:/WAVE-HUL-18
Improperly constructed page tree/PDF-HUL-30
Improperly constructed page tree/PDF-HUL-31
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-36
Insufficient values for TileOffsets: .+ < .+:/TIFF-HUL-38
Invalid Annotations/PDF-HUL-21
Invalid Annotations/PDF-HUL-22
Invalid DateTime digit: .+:/TIFF-HUL-55
Invalid DateTime digit: .+:/TIFF-HUL-56
Invalid ID in trailer/PDF-HUL-77
Invalid ID in trailer/PDF-HUL-78
Invalid ID in trailer/PDF-HUL-79
Invalid Names dictionary/PDF-HUL-89
Invalid Names dictionary/PDF-HUL-90
Invalid character in hex string/PDF-HUL-11
Invalid character in hex string/PDF-HUL-67
Invalid cross-reference table/PDF-HUL-68
Invalid cross-reference table/PDF-HUL-69
Invalid data in document structure root/PDF-HUL-61
Invalid data in document structure root/PDF-HUL-62
Invalid destination object/PDF-HUL-1
Invalid destination object/PDF-HUL-2
Invalid destinations dictionary/PDF-HUL-91
Invalid destinations dictionary/PDF-HUL-92
Invalid dictionary data for page/PDF-HUL-26
Invalid dictionary data for page/PDF-HUL-27
Invalid dictionary data for page/PDF-HUL-28
Invalid document structure root/PDF-HUL-59
Invalid document structure root/PDF-HUL-60
Invalid name tree/PDF-HUL-12
Invalid name tree/PDF-HUL-13
Invalid name tree/PDF-HUL-14
Invalid name tree/PDF-HUL-15
Invalid name tree/PDF-HUL-16
Invalid object definition/PDF-HUL-35
Invalid object definition/PDF-HUL-36
Invalid object definition/PDF-HUL-37
Invalid object definition/PDF-HUL-38
Invalid object number or object stream/PDF-HUL-108
Invalid object number or object stream/PDF-HUL-110
Invalid or ill-formed XMP metadata/PDF-HUL-100
Invalid or ill-formed XMP metadata/PDF-HUL-101
Invalid outline dictionary item/PDF-HUL-125
Invalid outline dictionary item/PDF-HUL-126
Invalid outline dictionary item/PDF-HUL-127
Invalid outline dictionary item/PDF-HUL-130
Invalid outline dictionary item/PDF-HUL-131
Invalid structure attribute/PDF-HUL-51
Invalid structure attribute/PDF-HUL-52
Invalid structure attribute/PDF-HUL-53
Lexical error/PDF-HUL-65
Lexical error/PDF-HUL-66
Malformed cross-reference table/PDF-HUL-82
Malformed cross-reference table/PDF-HUL-83
No TIFF header: .+ .+/TIFF-HUL-20
No TIFF header: .+ .+/TIFF-HUL-22
No TIFF magic number: .+/TIFF-HUL-21
No TIFF magic number: .+/TIFF-HUL-23
No document catalog dictionary/PDF-HUL-85
No document catalog dictionary/PDF-HUL-86
Outlines contain recursive references/PDF-HUL-123
Outlines contain recursive references/PDF-HUL-128
Outlines contain recursive references/PDF-HUL-129
Parsing error/HTML-HUL-2
Parsing error/HTML-HUL-3
Premature EOF/TIFF-HUL-1
Premature EOF:/TIFF-HUL-57
Read error for tag .+/TIFF-HUL-19
Read error for tag .+:/TIFF-HUL-13
Read error for tag .+:/TIFF-HUL-16
Read error for tag .+:/TIFF-HUL-18
SaxParseException: .+:/XML-HUL-1
SaxParseException: .+:/XML-HUL-3
Streams may not be embedded in object streams/PDF-HUL-47
Streams may not be embedded in object streams/PDF-HUL-48
Tag illegal in context/HTML-HUL-4
Tag illegal in context/HTML-HUL-5
Unexpected exception .+/PDF-HUL-102
Unexpected exception .+/PDF-HUL-103
Unexpected exception .+/PDF-HUL-94
Unexpected exception .+/PDF-HUL-99

@RvanVeenendaal RvanVeenendaal changed the title Rewrite info, warning and error message texts so that they are all different? Rewrite info, warning and error message texts so that each one is unique? Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant