Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of local patient and recording identifier fields is incomplete per the EDF+ spec #88

Open
ararslan opened this issue Dec 6, 2024 · 0 comments

Comments

@ararslan
Copy link
Member

ararslan commented Dec 6, 2024

Currently we parse the local patient identifier and local recording identifier fields of the EDF file header into PatientID and RecordingID objects, respectively, that store only the subfields described by the EDF+ specification. When parsing based on the described structure fails, we instead store the local patient and recording identifier header fields as strings. This allows retaining any useful information that may be in those fields but is not formatted in compliance with EDF+ (and it doesn't need to be—not every EDF file is EDF+ compliant). All of this is fine.

What I failed to recognize in initially implementing this is that the EDF+ spec says that the local patient and recording identifier fields start with the described subfields, they do not necessarily consist solely of the described subfields; additional information may be present in the remaining text. This means that we fail to parse into PatientID and RecordingID more often that we should because the types and their parsing are too restrictive. (And again, we aren't losing data currently, we're just making things a bit harder for downstream consumers than it needs to be.)

Some options to resolve this:

  1. Amend FileHeader such that patient and recording are always String and are not automatically parsed into PatientID/RecordingID. Users would then call e.g. parse(PatientID, file.header.patient). A constructor like PatientID(::EDF.File) could be defined for convenience. The downside of this approach is that we have to re-parse the fields on each call, which is probably fine, tbh.
  2. Add a field called extra or whatever to both PatientID and RecordingID that stores any additional unparsed text. For non-EDF+-compliant files, all fields except for extra would be missing. There would likely need to be additional considerations here to ensure you can still distinguish e.g. hello (not EDF+ compliant) from X X X X hello (EDF+ compliant but all defined subfields missing).
  3. Add additional fields called patient_unparsed and recording_unparsed (or whatever) that stores additional unparsed text from the patient and recording fields, respectively. When parsing a PatientID/RecordingID fails, the full strings are stored in the *_unparsed fields and the patient and recording fields are missing. This allows round-tripping and avoids re-parsing.
  4. Probably other things I haven't thought of.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant