Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor mode #2680

Open
ImBIOS opened this issue Sep 26, 2024 · 3 comments · May be fixed by #2692
Open

Monitor mode #2680

ImBIOS opened this issue Sep 26, 2024 · 3 comments · May be fixed by #2692
Labels
feature Issues that represent new features or improvements to existing features. hacktoberfest help wanted Extra attention is needed. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@ImBIOS
Copy link

ImBIOS commented Sep 26, 2024

Which package is the feature request for? If unsure which one to select, leave blank

crawlee

Because I'm actively using PuppeteerCrawler from crawlee I might test it with that, so I'll focus to test using it first.

Feature

I migrated from puppeteer-cluster to crawlee, and I missed their monitor feature for local dev.

Motivation

It's handy to track time estimation.

Ideal solution or implementation, and any additional constraints

  • Consume and reuse existing statistic data of task completed and we will only add what's missing for the monitor, I don't currently know which file is it. But I'm sure RequestQueue and Concurrency features have this data.

  • Imagined CLI UI:

Start: START_TIME
Now: CURENT_TIME (running for CONSUMED_TIME)
Progress: FINISHED / TOTAL_TASK (FINISHED_PERCENTAGE), failed: FAILED (FAILED_PERCENTAGE)
Remaining: ESTIMATED_TIME (SPEED)
Sys. load: CPU_LOAD / MEM_LOAD
Concurrencies: CONCURRENCY_INFO
CONCURRENCY_LIST
  • Add a new Monitor class in packages/core/src/monitor.ts to handle the display of the monitor UI. It will contain the logic to write into the output and logic to gather and calculate the monitor data.

  • Integrate the Monitor class into the BasicCrawler class in packages/basic-crawler/src/internals/basic-crawler.ts

  • The Monitor class tracks and displays time estimation and concurrency status in the CLI output at regular intervals as proposed UI template.

  • Updated the run function in packages/basic-crawler/src/internals/basic-crawler.ts to initialize and start the Monitor class.

Alternative solutions or implementations

No response

Other context

  • crawlee already using built-in log, so to make sure this monitor output not overwrite the log, we should find out how to write monitor and log output in separate line.
@ImBIOS ImBIOS added the feature Issues that represent new features or improvements to existing features. label Sep 26, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 26, 2024
@B4nan B4nan added help wanted Extra attention is needed. hacktoberfest labels Sep 26, 2024
@janbuchar
Copy link
Contributor

Hello! Could you please elaborate what the monitor feature does? Or provide a link?

@B4nan
Copy link
Member

B4nan commented Sep 26, 2024

There is a gif in their readme https://github.com/thomasdondorf/puppeteer-cluster

@ImBIOS
Copy link
Author

ImBIOS commented Sep 26, 2024

Oh sorry forgot to insert link to source, Thanks for the help, here's the gif.

ImBIOS added a commit to ImBIOS/crawlee that referenced this issue Oct 2, 2024
Fixes apify#2680

Add a new Monitor class to track and display time estimation and concurrency status in the CLI output at regular intervals.

* **Monitor Class**:
  - Add `Monitor` class in `packages/core/src/monitor.ts`.
  - Include logic to write into the output and gather and calculate the monitor data.
* **BasicCrawler Integration**:
  - Import `Monitor` class in `packages/basic-crawler/src/internals/basic-crawler.ts`.
  - Initialize and start the `Monitor` class in the `run` function.
  - Ensure monitor output and `log` output are written on separate lines.
  - Add `monitor` option to `BasicCrawlerOptions` interface.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/apify/crawlee/issues/2680?shareId=XXXX-XXXX-XXXX-XXXX).
@ImBIOS ImBIOS linked a pull request Oct 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features. hacktoberfest help wanted Extra attention is needed. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants