Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearchitecture MWOffliner HTML/CSS/JS scraping (part #1) #1839

Merged
merged 31 commits into from
Aug 24, 2023

Conversation

DonAlexandro
Copy link
Contributor

@DonAlexandro DonAlexandro commented May 11, 2023

Github issue

Fixes partly #1830

Overview

16.05.2023

  1. Moved article treatment methods from saveArticle.ts to article.treatment class
  2. Refactored the code with the new class
  3. Fixed unit tests for the new class
  4. Moved media treatment methods from saveArticle.ts to media.treatment class
  5. Refactored the code with the media.treatment class
  6. Fixed unit tests for the media.treatment class

15.05.2023

  1. Completely finished refactoring pieces of code dedicated to specific API endpoints
  2. Fixed all existing unit tests
  3. Fixed and added a couple of new ones for directors

12.05.2023

  1. Finished unit test
  2. Almost finished replacing old code with new directors

11.05.2023

  1. Created directors for all URLs
  2. Updated an initial implementation of URL builder
  3. Studied how to run unit tests on the project and started writing some for builder and its directors

21.08.2023

  1. Remove mw capabilities for desktop and visual rendering in favor of respective lazy methods in MediaWiki class (fixes Remove MWCapabilities #1357).
  2. Convert MediaWiki class into the singleton.
  3. Create renderer builder class to handle rendering between different kind of API (such as Wikimedia Desktop, VisualEditor and placeholder for WikimediaMobile).
  4. Refactored unit tests to handle renderers and MediaWiki singleton.

@kelson42 kelson42 marked this pull request as draft May 11, 2023 11:56
@codecov
Copy link

codecov bot commented May 11, 2023

Codecov Report

Patch coverage: 80.35% and project coverage change: +0.29% 🎉

Comparison is base (eb9fc5c) 71.24% compared to head (d949fd8) 71.53%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1839      +/-   ##
==========================================
+ Coverage   71.24%   71.53%   +0.29%     
==========================================
  Files          30       34       +4     
  Lines        2671     2737      +66     
  Branches      595      606      +11     
==========================================
+ Hits         1903     1958      +55     
- Misses        659      667       +8     
- Partials      109      112       +3     
Files Changed Coverage Δ
src/util/builders/url/basic.director.ts 100.00% <ø> (ø)
src/util/builders/url/url.builder.ts 97.43% <ø> (ø)
src/util/const.ts 100.00% <ø> (ø)
src/util/misc.ts 73.22% <ø> (ø)
src/util/renderers/renderer.builder.ts 22.50% <22.50%> (ø)
src/util/renderers/visual-editor.renderer.ts 62.85% <62.85%> (ø)
src/Downloader.ts 55.71% <64.00%> (-6.91%) ⬇️
src/util/saveArticles.ts 78.31% <66.66%> (-3.81%) ⬇️
src/util/rewriteUrls.ts 83.10% <72.72%> (ø)
src/util/renderers/wikimedia-desktop.renderer.ts 83.33% <83.33%> (ø)
... and 8 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kelson42 kelson42 force-pushed the feature/1830-rearchitecture branch from f22dea4 to c5e4ae7 Compare May 24, 2023 14:58
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonAlexandro I had really little time to look at your work still. I'm pretty convinced by your approach around URL and I like many things around that part. Therefore I think we could merge your work arounds URLs pretty quickly. For the rest, this is more complex and I'll have a look tomorrow morning. Could you please isolate all the URL stuff in a dedicated PR, so we can polish and merge?

src/Downloader.ts Show resolved Hide resolved
src/util/builders/url/api.director.ts Outdated Show resolved Hide resolved
src/Downloader.ts Outdated Show resolved Hide resolved
src/Downloader.ts Show resolved Hide resolved
src/MediaWiki.ts Outdated Show resolved Hide resolved
@DonAlexandro DonAlexandro marked this pull request as ready for review June 19, 2023 12:40
@DonAlexandro DonAlexandro marked this pull request as draft June 19, 2023 12:40
@VadimKovalenkoSNF VadimKovalenkoSNF force-pushed the feature/1830-rearchitecture branch 2 times, most recently from a31595d to 4be0b61 Compare July 31, 2023 14:43
Copy link
Collaborator

@VadimKovalenkoSNF VadimKovalenkoSNF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks OK, I applied some fixes related to MCS decommission and unit test tweaks on top of PR #1854

src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
test/unit/builders/url/mobile.director.test.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Show resolved Hide resolved
test/unit/downloader.test.ts Outdated Show resolved Hide resolved
src/Downloader.ts Show resolved Hide resolved
@VadimKovalenkoSNF VadimKovalenkoSNF marked this pull request as ready for review August 1, 2023 04:40
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first look, this PR does not tacle #1830:

  • I see almost no architectural changes beside the creation of the URL helper. Lot of code has been moved around, so I might miss something.
  • Point "No clear separation between the pieces of code dedicated to specific API end-points" is not fixed
  • Point "No common interface to use in the same way a module dealing with end-point number 1 or end-point number 2 " is not implemented
  • Subticket Remove MWCapabilities #1357 is not done at all

src/Downloader.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/treatments/article.treatment.ts Show resolved Hide resolved
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents

src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/article.renderer.ts Outdated Show resolved Hide resolved
src/util/renderers/parsoidHtmlRestApi.renderer.ts Outdated Show resolved Hide resolved
@kelson42 kelson42 changed the title [Feature] Rearchitecture MWOffliner HTML/CSS/JS scraping part Rearchitecture MWOffliner HTML/CSS/JS scraping (part #1) Aug 11, 2023
@@ -0,0 +1,3 @@
export abstract class Renderer {
abstract render(renderOpts: any): Promise<any>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a strict interface renderOpts: any this is really too error prone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/util/saveArticles.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still many oddities which looks strange to me. We definitly need to take a moment together.

src/Downloader.ts Show resolved Hide resolved
src/MediaWiki.ts Outdated Show resolved Hide resolved
src/MediaWiki.ts Outdated Show resolved Hide resolved
src/MediaWiki.ts Outdated Show resolved Hide resolved
src/mwoffliner.lib.ts Outdated Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/renderers/visual-editor.renderer.ts Show resolved Hide resolved
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have open tickets that this PR fixes fully? If "yes" description should be updated with Fixes #xyz.

src/Downloader.ts Outdated Show resolved Hide resolved
src/Downloader.ts Outdated Show resolved Hide resolved
src/Downloader.ts Show resolved Hide resolved
src/MediaWiki.ts Show resolved Hide resolved
src/MediaWiki.ts Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally LGTM

@kelson42 kelson42 merged commit 77a064a into main Aug 24, 2023
5 of 6 checks passed
@kelson42 kelson42 deleted the feature/1830-rearchitecture branch August 24, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove MWCapabilities
3 participants