-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
245 fix querying public extraction function #246
Conversation
…traction-function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find any wrong logic at first sight, but hard to check purely from code. Thanks for adding tests!!
I'm too stupid to understand what happens :) |
start_date = row["start_date"] | ||
end_date = row["end_date"] | ||
|
||
proposed_valid_date_fwd = valid_date + pd.DateOffset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? Shouldn't it be '-' in the first one and '+' in the second one?
Guess I don't understand what it is used for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, but this is exactly the case, no? like here in these lines
worldcereal-classification/src/worldcereal/utils/refdata.py
Lines 244 to 249 in 44934e8
proposed_valid_date_fwd = valid_date + pd.DateOffset( | |
months=row["valid_month_shift_forward"] | |
) | |
proposed_valid_date_bwd = valid_date - pd.DateOffset( | |
months=row["valid_month_shift_backward"] | |
) |
or do you refer to something else?
@@ -284,7 +286,7 @@ def process_parquet( | |||
pd.DataFrame | |||
processed dataframe with the necessary columns for training. | |||
""" | |||
from presto.utils import process_parquet as process_parquet_for_presto | |||
from presto.utils import process_parquet | |||
|
|||
logger.info("Processing selected samples ...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way how processing_period_middle_ts is defined a few lines below doesn't seem to be future proof?
What if we move away from 12 month processing periods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. I tried to make a small step towards more generic time series handling here (925ff21)
For the default case that we have (12 monthly timesteps), it replicates the previous implementation. It can also handle other frequencies/lengths.
But we might need to add similar logic to other relevant places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we anticipate that this will happen within the CCN timeframe? Because if not, we may want to keep this as nice to have (track an issue somewhere) but not get distracted by this future option too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could briefly sit together and you once again talk me through this.
I lost the scheme you once drew which explains what should happen...
No, this is not right. If extractions range allows, for this case the proposed_valid_date should be 2020-06-01. I also can't seem to reproduce this case with the code... Can you please share the start_date and end_date for the sample (not the user-defined range, but the min and max dates for which actual extractions are available)? I'll try to debug and prepare a better explanation, and let's discuss further during the meeting. |
OK, looks better now! |
Okay, I found a small mixup bug that was causing this, fixed here de55b03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, PR is ready to be merged.
Seperate issue to be created for informing the user on how validity time has been shifted using a different attribute...
Summary of an offline discussion: |
Main changes: