Report scraper progress #52

benoit74 · 2023-10-10T15:46:31Z

Add support to generate the task_progress.json file, so that it can reported by Zimfarm workers and be displayed in Zimfarm UI

The text was updated successfully, but these errors were encountered:

githyuvi · 2024-05-10T04:42:29Z

I went through this issue. This is what I suppose is to be implemented.

After going through this code.

    def populate_nodes_executor(self):
        """Loop on content nodes to create zim entries from kolibri DB"""

        def schedule_node(item):
            future = self.nodes_executor.submit(self.add_node, item=item)
            self.nodes_futures.add(future)

        # schedule root-id
        schedule_node((self.db.root["id"], self.db.root["kind"]))

        # fill queue with (node_id, kind) tuples for all root node's descendants
        for node in self.db.get_node_descendants(self.root_id):
            if self.node_ids is None or node["id"] in self.node_ids:
                schedule_node((node["id"], node["kind"]))

I suppose I should track self.nodes_futures.
Let me know if I am on the right track

benoit74 · 2024-05-13T11:27:51Z

Yes, plus the videos_futures ; videos_futures are particularly important since they are populated when a video needs reencoding, and quite often this takes way longer that nodes processing.

However, I suspect this multiprocessing code is significantly broken, see #106

I suspect we will not use this methods anymore in the future, or at least we will most probably mutualise the multiprocessing logic in a shared module.

I don't know if it is really convenient to implement this scraper progress feature now, given that it might be difficult to debug due to the other issue + the functions might change in the future.

benoit74 added the enhancement New feature or request label Oct 10, 2023

benoit74 added this to the 1.2.0 milestone Oct 10, 2023

benoit74 modified the milestones: 1.2.0, 1.3.0 Feb 14, 2024

benoit74 modified the milestones: 1.3.0, 2.0.0 Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report scraper progress #52

Report scraper progress #52

benoit74 commented Oct 10, 2023 •

edited

Loading

githyuvi commented May 10, 2024

benoit74 commented May 13, 2024

Report scraper progress #52

Report scraper progress #52

Comments

benoit74 commented Oct 10, 2023 • edited Loading

githyuvi commented May 10, 2024

benoit74 commented May 13, 2024

benoit74 commented Oct 10, 2023 •

edited

Loading