Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added generator logic #10

Merged
merged 1 commit into from
Sep 12, 2024
Merged

Added generator logic #10

merged 1 commit into from
Sep 12, 2024

Conversation

josephlewis42
Copy link
Collaborator

@josephlewis42 josephlewis42 commented Aug 10, 2024

Adds an MVP version of the generator: #1. With this, we get pretty nice support for most DevDocs items.

Known limitations:

  • third_party assets are checked in and will need to be manually refreshed. Sadly, this info is only in the GitHub repo, but not the generated content on devdocs.io so there's a small window where it might become out of sync.
  • Generation is very slow for resources like ansible which have very large number of entries in their navbars which causes Jinja to choke when instantiating and escaping tons of objects.
  • Generating content with the default libzim compression is also slow: User-defined compression level libzim#544, we need some compression because images are base64 encoded and the navbars are repetitive, but doing so at the highest level is spending a bunch of time trying to further compress things with already high information density (e.g. JPEGs).

Once merged, good proof of concepts for this would be:

Copy link

codecov bot commented Aug 10, 2024

Codecov Report

Attention: Patch coverage is 96.18321% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.56%. Comparing base (64a1c39) to head (353798e).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/devdocs2zim/generator.py 94.11% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #10       +/-   ##
===========================================
+ Coverage   82.32%   92.56%   +10.24%     
===========================================
  Files           4        4               
  Lines         198      323      +125     
  Branches       19       43       +24     
===========================================
+ Hits          163      299      +136     
+ Misses         35       22       -13     
- Partials        0        2        +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/devdocs2zim/assets/README.md Outdated Show resolved Hide resolved
src/devdocs2zim/client.py Show resolved Hide resolved
src/devdocs2zim/client.py Show resolved Hide resolved
src/devdocs2zim/client.py Show resolved Hide resolved
src/devdocs2zim/client.py Show resolved Hide resolved
src/devdocs2zim/templates/page.html Outdated Show resolved Hide resolved
src/devdocs2zim/templates/license.html Outdated Show resolved Hide resolved
src/devdocs2zim/templates/page.html Outdated Show resolved Hide resolved
src/devdocs2zim/generator.py Show resolved Hide resolved
src/devdocs2zim/generator.py Show resolved Hide resolved
src/devdocs2zim/generator.py Show resolved Hide resolved
src/devdocs2zim/assets/COPYRIGHT Outdated Show resolved Hide resolved
@benoit74
Copy link
Contributor

benoit74 commented Sep 9, 2024

Hi @josephlewis42, thank you very much, I will review this tomorrow, I missed the notification in the avalanche I got when coming back from holidays, sorry for that!

Copy link
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See few inline comments

I've created a ZIM of lua~5.4 and cpp. I will open separate issues to discuss what I've found there (nothing which is a blocker to merge this code, more stuff to be discussed / fixed before releasing a 0.1.0 version)

Regarding the time it takes to complete, I'm not that surprised. cpp tooks 15 minutes to complete, which is OK. We are used to scraper taking hours or even sometime days to complete, so this is not a concern from my PoV. But still good to mention and try to enhance, the faster the scraper can be the better.

Regarding known limitations, I suggest we open one issue per limitation so that we can discuss separately how to address them (or even decide to not address them ^^). I agree we do not need to fix them before the MVP.

src/devdocs2zim/client.py Show resolved Hide resolved
src/devdocs2zim/generator.py Outdated Show resolved Hide resolved
src/devdocs2zim/generator.py Outdated Show resolved Hide resolved
src/devdocs2zim/generator.py Show resolved Hide resolved
@benoit74 benoit74 merged commit 1e5b6b6 into main Sep 12, 2024
8 checks passed
@benoit74 benoit74 deleted the generator branch September 12, 2024 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants