You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we parse the local patient identifier and local recording identifier fields of the EDF file header into PatientID and RecordingID objects, respectively, that store only the subfields described by the EDF+ specification. When parsing based on the described structure fails, we instead store the local patient and recording identifier header fields as strings. This allows retaining any useful information that may be in those fields but is not formatted in compliance with EDF+ (and it doesn't need to be—not every EDF file is EDF+ compliant). All of this is fine.
What I failed to recognize in initially implementing this is that the EDF+ spec says that the local patient and recording identifier fields start with the described subfields, they do not necessarily consist solely of the described subfields; additional information may be present in the remaining text. This means that we fail to parse into PatientID and RecordingID more often that we should because the types and their parsing are too restrictive. (And again, we aren't losing data currently, we're just making things a bit harder for downstream consumers than it needs to be.)
Some options to resolve this:
Amend FileHeader such that patient and recording are always String and are not automatically parsed into PatientID/RecordingID. Users would then call e.g. parse(PatientID, file.header.patient). A constructor like PatientID(::EDF.File) could be defined for convenience. The downside of this approach is that we have to re-parse the fields on each call, which is probably fine, tbh.
Add a field called extra or whatever to both PatientID and RecordingID that stores any additional unparsed text. For non-EDF+-compliant files, all fields except for extra would be missing. There would likely need to be additional considerations here to ensure you can still distinguish e.g. hello (not EDF+ compliant) from X X X X hello (EDF+ compliant but all defined subfields missing).
Add additional fields called patient_unparsed and recording_unparsed (or whatever) that stores additional unparsed text from the patient and recording fields, respectively. When parsing a PatientID/RecordingID fails, the full strings are stored in the *_unparsed fields and the patient and recording fields are missing. This allows round-tripping and avoids re-parsing.
Probably other things I haven't thought of.
Thoughts?
The text was updated successfully, but these errors were encountered:
Currently we parse the local patient identifier and local recording identifier fields of the EDF file header into
PatientID
andRecordingID
objects, respectively, that store only the subfields described by the EDF+ specification. When parsing based on the described structure fails, we instead store the local patient and recording identifier header fields as strings. This allows retaining any useful information that may be in those fields but is not formatted in compliance with EDF+ (and it doesn't need to be—not every EDF file is EDF+ compliant). All of this is fine.What I failed to recognize in initially implementing this is that the EDF+ spec says that the local patient and recording identifier fields start with the described subfields, they do not necessarily consist solely of the described subfields; additional information may be present in the remaining text. This means that we fail to parse into
PatientID
andRecordingID
more often that we should because the types and their parsing are too restrictive. (And again, we aren't losing data currently, we're just making things a bit harder for downstream consumers than it needs to be.)Some options to resolve this:
FileHeader
such thatpatient
andrecording
are alwaysString
and are not automatically parsed intoPatientID
/RecordingID
. Users would then call e.g.parse(PatientID, file.header.patient)
. A constructor likePatientID(::EDF.File)
could be defined for convenience. The downside of this approach is that we have to re-parse the fields on each call, which is probably fine, tbh.extra
or whatever to bothPatientID
andRecordingID
that stores any additional unparsed text. For non-EDF+-compliant files, all fields except forextra
would bemissing
. There would likely need to be additional considerations here to ensure you can still distinguish e.g.hello
(not EDF+ compliant) fromX X X X hello
(EDF+ compliant but all defined subfields missing).patient_unparsed
andrecording_unparsed
(or whatever) that stores additional unparsed text from thepatient
andrecording
fields, respectively. When parsing aPatientID
/RecordingID
fails, the full strings are stored in the*_unparsed
fields and thepatient
andrecording
fields aremissing
. This allows round-tripping and avoids re-parsing.Thoughts?
The text was updated successfully, but these errors were encountered: