You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I went through this issue. This is what I suppose is to be implemented.
After going through this code.
def populate_nodes_executor(self):
"""Loop on content nodes to create zim entries from kolibri DB"""
def schedule_node(item):
future = self.nodes_executor.submit(self.add_node, item=item)
self.nodes_futures.add(future)
# schedule root-id
schedule_node((self.db.root["id"], self.db.root["kind"]))
# fill queue with (node_id, kind) tuples for all root node's descendants
for node in self.db.get_node_descendants(self.root_id):
if self.node_ids is None or node["id"] in self.node_ids:
schedule_node((node["id"], node["kind"]))
I suppose I should track self.nodes_futures.
Let me know if I am on the right track
Yes, plus the videos_futures ; videos_futures are particularly important since they are populated when a video needs reencoding, and quite often this takes way longer that nodes processing.
However, I suspect this multiprocessing code is significantly broken, see #106
I suspect we will not use this methods anymore in the future, or at least we will most probably mutualise the multiprocessing logic in a shared module.
I don't know if it is really convenient to implement this scraper progress feature now, given that it might be difficult to debug due to the other issue + the functions might change in the future.
Add support to generate the
task_progress.json
file, so that it can reported by Zimfarm workers and be displayed in Zimfarm UIThe text was updated successfully, but these errors were encountered: