Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

ararslan · 2024-12-06T19:22:25Z

Currently we parse the local patient identifier and local recording identifier fields of the EDF file header into PatientID and RecordingID objects, respectively, that store only the subfields described by the EDF+ specification. When parsing based on the described structure fails, we instead store the local patient and recording identifier header fields as strings. This allows retaining any useful information that may be in those fields but is not formatted in compliance with EDF+ (and it doesn't need to be—not every EDF file is EDF+ compliant). All of this is fine.

What I failed to recognize in initially implementing this is that the EDF+ spec says that the local patient and recording identifier fields start with the described subfields, they do not necessarily consist solely of the described subfields; additional information may be present in the remaining text. This means that we fail to parse into PatientID and RecordingID more often that we should because the types and their parsing are too restrictive. (And again, we aren't losing data currently, we're just making things a bit harder for downstream consumers than it needs to be.)

Some options to resolve this:

Amend FileHeader such that patient and recording are always String and are not automatically parsed into PatientID/RecordingID. Users would then call e.g. parse(PatientID, file.header.patient). A constructor like PatientID(::EDF.File) could be defined for convenience. The downside of this approach is that we have to re-parse the fields on each call, which is probably fine, tbh.
Add a field called extra or whatever to both PatientID and RecordingID that stores any additional unparsed text. For non-EDF+-compliant files, all fields except for extra would be missing. There would likely need to be additional considerations here to ensure you can still distinguish e.g. hello (not EDF+ compliant) from X X X X hello (EDF+ compliant but all defined subfields missing).
Add additional fields called patient_unparsed and recording_unparsed (or whatever) that stores additional unparsed text from the patient and recording fields, respectively. When parsing a PatientID/RecordingID fails, the full strings are stored in the *_unparsed fields and the patient and recording fields are missing. This allows round-tripping and avoids re-parsing.
Probably other things I haven't thought of.

Thoughts?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

ararslan commented Dec 6, 2024

Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

Comments

ararslan commented Dec 6, 2024