Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!(libecalc): Validate datetimes in time series resources #790

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Aleksander-Karlsson
Copy link

@Aleksander-Karlsson Aleksander-Karlsson commented Feb 17, 2025

Have you remembered and considered?

  • I have remembered to update documentation
  • I have remembered to update manual changelog (docs/drafts/next.draft.md)
  • I have remembered to update migration guide (docs/docs/migration_guides/)
  • I have committed with BREAKING: in footer or ! in header, if breaking
  • I have added tests (if not, comment why)
  • I have used conventional commits syntax (if you squash, make sure that conventional commit is used)
  • I have included the Jira issue ID somewhere in the commit body (ECALC-XXXX)

Why is this pull request needed?

Validate dates and datetimes in time series resources. This reduces the inherent ambiguousness of the previous implementation, where you could mix e.g. month first and day first dates. This would "work" in so far that no errors were raised, but it could lead to a month-first date being parsed as day first - If user supplied 01/10/2000 it would be interpreted as 1st of September, even if it actually was the 10th of January!

Added bonus:
Makes the handling of dates in a resource faster (bit outdated but concept stands).

speed increase:
https://perfpy.com/953
with 10 dates, roughly 3x speed.
with 2000 dates, roughly 70x speed!

What does this pull request change?

Replace date-handling method done with pandas for each record in a list with method using one pandas operator for the whole list, once.

Issues related to this change:

https://github.com/equinor/ecalc-internal/issues/454

@Aleksander-Karlsson Aleksander-Karlsson self-assigned this Feb 18, 2025
@Aleksander-Karlsson Aleksander-Karlsson marked this pull request as ready for review February 18, 2025 08:17
@Aleksander-Karlsson Aleksander-Karlsson requested a review from a team as a code owner February 18, 2025 08:17
"""
date_patterns = {
# Only year supplied (YYYY e.g. 1996).
"YEAR_ONLY": r"\d{4}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have traditionally in eCalc had support for 3 different date formats, so in theory other formats has not been supported. See run.py and datetime/utils.py under date_format_option. This was set on CLI (cant remember if this was in and/or output actually ...`

We "always" wanted to only accept ISO8601 YYYY-MM-DD HH:MM:SS but we had to accept DD.MM.YYYY as that was the most common format in csv files.

We do have several places where we parse data from csv, or at least we had. Might be a problem as well. Jostein should know...

I think we open up for more challenges now to support more dateformats than we had, but if we at some point always preprocess .csv data into json structures f ex, than it might be handy to support many raw import formats ... Since we now use it directly, I think we should stick to just a few ...

I would like to hear what others think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to limit the support of the date formats to only the 3 formats already supported and not open for more variations at this point in time as Gary Neville would say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before releasing this change I think we also need to add validation on saving resource files that enforces same format in the same file.
At the same time check if we have resource files with multiple date formats and if so migrate these.

@Aleksander-Karlsson Aleksander-Karlsson requested a review from a team as a code owner February 28, 2025 09:59
Ensure stricter and faster parsing of datetimes ensuring whole timeseries must be in similar format.

Refs: equinor/ecalc-internal#454
@Aleksander-Karlsson Aleksander-Karlsson changed the title perf(libecalc): Increase performance of datetime-parsing feat!(libecalc): Validate Increase performance of datetime-parsing Mar 5, 2025
@Aleksander-Karlsson Aleksander-Karlsson changed the title feat!(libecalc): Validate Increase performance of datetime-parsing feat!(libecalc): Validate datetimes in time series resources Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants