Conversation
| actions = (event.data for event in events) | ||
|
|
||
| index = 0 | ||
| async for success, item in helpers.async_streaming_bulk(client, actions, **kwargs): # type: ignore | ||
| if index >= len(events): | ||
| break | ||
| async for success, item in helpers.async_streaming_bulk(client, actions, **kwargs): | ||
|
|
||
| event = events[index] | ||
| event.state.current_state = EventStateType.STORING_IN_OUTPUT | ||
| # This should not be possible! | ||
| assert index < len(events) |
There was a problem hiding this comment.
bulk_id = uuid.uuid4()
actions = {**event.data, "_id": f"{bulk_id}_{index}"} for index, event in enumerate(events)
index = 0
async for success, item in helpers.async_streaming_bulk(client, actions, **kwargs):
# This should not be possible!
assert index < len(events)
assert index == int(item["create"]["_id"][37:])This proofs that helpers.async_streaming_bulk keeps the order for actions and yield iteration the same
There was a problem hiding this comment.
This could possibly stay in the code, but I dont like generating a uuid here and setting it as an id, if opensearch probably does that more performantly
mhoff
left a comment
There was a problem hiding this comment.
Many thanks for your work. Here the few comments we already discussed
| async for success, item in helpers.async_streaming_bulk(client, actions, **kwargs): # type: ignore | ||
| if index >= len(events): | ||
| break | ||
| async for success, item in helpers.async_streaming_bulk(client, actions, **kwargs): |
There was a problem hiding this comment.
Please add a follow-up ticket for us that we might want send the chunks concurrently in the future, depending on where we identify actual performance bottlenecks
There was a problem hiding this comment.
https://github.com/fkie-cad/Logprep/tree/wip-async-output-no-helper here is a wip implementation, but this is only around 50 eps faster, should look into this
897ce9f to
5b82bb8
Compare
…eys are in list to not panic on indexing
6f7b485 to
dd30914
Compare
Description
Cleanup and optimize Opensearch async output
Assignee
Documentation
Code Quality
How did you verify that the changes work in practice?
Reviewer
The rendered docs for this PR can be found here.