-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!(libecalc): Validate datetimes in time series resources #790
base: main
Are you sure you want to change the base?
Conversation
82c3671
to
9a46ad4
Compare
""" | ||
date_patterns = { | ||
# Only year supplied (YYYY e.g. 1996). | ||
"YEAR_ONLY": r"\d{4}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have traditionally in eCalc had support for 3 different date formats, so in theory other formats has not been supported. See run.py
and datetime/utils.py
under date_format_option
. This was set on CLI (cant remember if this was in and/or output actually ...`
We "always" wanted to only accept ISO8601 YYYY-MM-DD HH:MM:SS but we had to accept DD.MM.YYYY as that was the most common format in csv files.
We do have several places where we parse data from csv, or at least we had. Might be a problem as well. Jostein should know...
I think we open up for more challenges now to support more dateformats than we had, but if we at some point always preprocess .csv data into json structures f ex, than it might be handy to support many raw import formats ... Since we now use it directly, I think we should stick to just a few ...
I would like to hear what others think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to limit the support of the date formats to only the 3 formats already supported and not open for more variations at this point in time as Gary Neville would say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before releasing this change I think we also need to add validation on saving resource files that enforces same format in the same file.
At the same time check if we have resource files with multiple date formats and if so migrate these.
Ensure stricter and faster parsing of datetimes ensuring whole timeseries must be in similar format. Refs: equinor/ecalc-internal#454
763c10c
to
3af7a2b
Compare
Have you remembered and considered?
docs/drafts/next.draft.md
)docs/docs/migration_guides/
)BREAKING:
in footer or!
in header, if breakingECALC-XXXX
)Why is this pull request needed?
Validate dates and datetimes in time series resources. This reduces the inherent ambiguousness of the previous implementation, where you could mix e.g. month first and day first dates. This would "work" in so far that no errors were raised, but it could lead to a month-first date being parsed as day first - If user supplied 01/10/2000 it would be interpreted as 1st of September, even if it actually was the 10th of January!
Added bonus:
Makes the handling of dates in a resource faster (bit outdated but concept stands).
speed increase:
https://perfpy.com/953
with 10 dates, roughly 3x speed.
with 2000 dates, roughly 70x speed!
What does this pull request change?
Replace date-handling method done with pandas for each record in a list with method using one pandas operator for the whole list, once.
Issues related to this change:
https://github.com/equinor/ecalc-internal/issues/454