-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time range parameters to sync script #483
base: main
Are you sure you want to change the base?
Changes from 14 commits
8e9c478
786f8ed
ba8f745
a608e4b
0cd668c
11c9414
0f45be5
08615b9
14901f1
2b5fb06
c967b64
819f8cf
58722bc
1f01c20
dbf4301
94a5216
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,75 @@ | ||
"""Shared methods between both sync scripts.""" | ||
|
||
from datetime import datetime, timezone | ||
from typing import List, Tuple | ||
|
||
from dateutil.relativedelta import relativedelta | ||
from web3 import Web3 | ||
|
||
from src.logger import set_log | ||
from src.models.block_range import BlockRange | ||
|
||
log = set_log(__name__) | ||
|
||
|
||
def compute_time_range( | ||
start_time: datetime, end_time: datetime | ||
) -> list[tuple[datetime, datetime]]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Didn't there used to be a type
that can be passed around the project for better readability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I am still not sure how to reasonably merge the two code bases of solver-rewards and dune-sync (v1). Accounting period seems to be tailored to rewards with defaulting to weeks and having pretty printing for dune tables. The accounting period also has some implicit constraints on having a time of 00:00:00 in some parts of the code. Making the concept of accounting periods consistent throughout the code would definitely help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, I block number ranges would be the most appropriate thing. This was the benefit of the old Anyway, I suppose this is just a side note. Feel free to ignore. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is an ongoing internal debate on what the most natural concept for accounting periods is. Block ranges sound nice when considering how backend data is structured. But they become a bit unnatural if multiple chains are considered. For simplicity, we want to use time based periods everywhere, across chains and projects. |
||
"""Computes (list of) time ranges from input parameters. | ||
If both times are from the same month, only [(start_time, end_time)] is returned. | ||
Otherwise, the range is split into n pieces of the form [(start_time, start_of_month_2), | ||
(start_of_month_2, start_of_month_3),..., (start_of_month_n, end_time)]. | ||
""" | ||
assert start_time < end_time, "start_time must be strictly smaller than end_time" | ||
|
||
# if there is just one month to consider | ||
if end_time <= datetime(start_time.year, start_time.month, 1).replace( | ||
tzinfo=timezone.utc | ||
) + relativedelta(months=1): | ||
return [(start_time, end_time)] | ||
|
||
# if there are multiple month to consider | ||
fhenneke marked this conversation as resolved.
Show resolved
Hide resolved
|
||
next_month_start_time = datetime(start_time.year, start_time.month, 1).replace( | ||
tzinfo=timezone.utc | ||
) + relativedelta(months=1) | ||
time_range_list = [(start_time, next_month_start_time)] | ||
while end_time > next_month_start_time + relativedelta(months=1): | ||
time_range_list.append( | ||
(next_month_start_time, next_month_start_time + relativedelta(months=1)) | ||
) | ||
next_month_start_time = next_month_start_time + relativedelta(months=1) | ||
time_range_list.append((next_month_start_time, end_time)) | ||
|
||
return time_range_list | ||
|
||
|
||
def compute_block_range( | ||
start_time: datetime, end_time: datetime, node: Web3 | ||
) -> BlockRange: | ||
"""Computes a block range from start and end time. | ||
The convention for block ranges is to be inclusive, while the end time is exclusive. | ||
""" | ||
latest_block = node.eth.get_block("finalized") | ||
latest_block_time = datetime.fromtimestamp( | ||
latest_block["timestamp"], tz=timezone.utc | ||
) | ||
|
||
assert ( | ||
start_time < latest_block_time | ||
), "start time must be smaller than latest block time" | ||
|
||
start_block = find_block_with_timestamp(node, start_time.timestamp()) | ||
if latest_block_time < end_time: | ||
log.info( | ||
f"Latest finalized block time {latest_block_time} is smaller than {end_time}." | ||
"Using latest finalized block." | ||
) | ||
end_block = int(latest_block["number"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a log here to let us know if this happens that the script is not actually using the end time because it's still in the future? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have an opinion on where this processing for time ranges should take place? On the one hand, most of the logging and checking can also happen during initialization in On the other hand, a node would be required to do the checking for latest block. And it would be mixing abstractions. The accounting, on a high level, should be time based. That this is internally split into monthly ranges and blocks is a bit of an implementation detail. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure to be honest. Since this requires actual implementation details and not checking whether or not the args are valid it feels cleaner to leave this here. For the sake of serializing the args any end time > start time would be valid, so it seems to me like this is the right place. But if moving this to the initialization makes it more maintainable then that could be a good argument to do this there instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now I just added a log message. |
||
else: | ||
end_block = find_block_with_timestamp(node, end_time.timestamp()) - 1 | ||
|
||
return BlockRange(block_from=start_block, block_to=end_block) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although this could happen only if one is really trying to make things crash, in principle you can get There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can, I guess. The issue would be that some of the queries returns an empty dataframe which would be missing columns. A following run to upsert data would then try to add data to that dataframe but columns do not match. One solution would be to drop the old data completely whenever the old result was empty. That would recover the current behavior. |
||
|
||
|
||
def find_block_with_timestamp(node: Web3, time_stamp: float) -> int: | ||
""" | ||
This implements binary search and returns the smallest block number | ||
|
@@ -40,77 +102,3 @@ def find_block_with_timestamp(node: Web3, time_stamp: float) -> int: | |
# fallback in case correct block number hasn't been found | ||
# in that case, we will include some more blocks than necessary | ||
return mid_block_number + 200 | ||
|
||
|
||
def compute_block_and_month_range( # pylint: disable=too-many-locals | ||
node: Web3, recompute_previous_month: bool | ||
) -> Tuple[List[Tuple[int, int]], List[str]]: | ||
""" | ||
This determines the block range and the relevant months | ||
for which we will compute and upload data on Dune. | ||
""" | ||
# The function first a list of block ranges, followed by a list of | ||
# # months. Block ranges are stored as (start_block, end_block) pairs, | ||
# and are meant to be interpreted as closed intervals. | ||
# Moreover, we assume that the job runs at least once every 24h | ||
# Because of that, if it is the first day of month, we also | ||
# compute the previous month's table just to be on the safe side | ||
|
||
latest_finalized_block = node.eth.get_block("finalized") | ||
|
||
current_month_end_block = int(latest_finalized_block["number"]) | ||
current_month_end_timestamp = latest_finalized_block["timestamp"] | ||
|
||
current_month_end_datetime = datetime.fromtimestamp( | ||
current_month_end_timestamp, tz=timezone.utc | ||
) | ||
current_month_start_datetime = datetime( | ||
current_month_end_datetime.year, current_month_end_datetime.month, 1, 00, 00 | ||
) | ||
current_month_start_timestamp = current_month_start_datetime.replace( | ||
tzinfo=timezone.utc | ||
).timestamp() | ||
|
||
current_month_start_block = find_block_with_timestamp( | ||
node, current_month_start_timestamp | ||
) | ||
|
||
current_month = ( | ||
f"{current_month_end_datetime.year}_{current_month_end_datetime.month}" | ||
) | ||
## in case the month is 1-9, we add a "0" prefix, so that we have a fixed-length representation | ||
## e.g., 2024-12, 2024-01 | ||
if len(current_month) < 7: | ||
current_month = current_month[:5] + "0" + current_month[5] | ||
months_list = [current_month] | ||
block_range = [(current_month_start_block, current_month_end_block)] | ||
if current_month_end_datetime.day == 1 or recompute_previous_month: | ||
if current_month_end_datetime.month == 1: | ||
previous_month = f"{current_month_end_datetime.year - 1}_12" | ||
previous_month_start_datetime = datetime( | ||
current_month_end_datetime.year - 1, 12, 1, 00, 00 | ||
) | ||
else: | ||
previous_month = f"""{current_month_end_datetime.year}_ | ||
{current_month_end_datetime.month - 1} | ||
""" | ||
if len(previous_month) < 7: | ||
previous_month = previous_month[:5] + "0" + previous_month[5] | ||
previous_month_start_datetime = datetime( | ||
current_month_end_datetime.year, | ||
current_month_end_datetime.month - 1, | ||
1, | ||
00, | ||
00, | ||
) | ||
months_list.append(previous_month) | ||
previous_month_start_timestamp = previous_month_start_datetime.replace( | ||
tzinfo=timezone.utc | ||
).timestamp() | ||
previous_month_start_block = find_block_with_timestamp( | ||
node, previous_month_start_timestamp | ||
) | ||
previous_month_end_block = current_month_start_block - 1 | ||
block_range.append((previous_month_start_block, previous_month_end_block)) | ||
|
||
return block_range, months_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of function is a bit confusing as its input is actually the time range, but tbh i don't have any good suggestion for the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to
partition_time_range
and added a comment at the call site to make it a bit clearer.Overall, this splitting of time ranges is a bit confusing since it is leaking implementation details of chunking into the rest of the code. The current main has that problem too, though.