Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove MWCapabilities object (partial impl) #1878

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 12 additions & 47 deletions src/Downloader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
normalizeMwResponse,
DB_ERROR,
WEAK_ETAG_REGEX,
renderArticle,

Check warning on line 22 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

'renderArticle' is defined but never used
stripHttpFromUrl,
isBitmapImageMimeType,
isImageUrl,
Expand All @@ -31,8 +31,6 @@
import * as logger from './Logger.js'
import MediaWiki from './MediaWiki.js'
import ApiURLDirector from './util/builders/url/api.director.js'
import DesktopURLDirector from './util/builders/url/desktop.director.js'
import VisualEditorURLDirector from './util/builders/url/visual-editor.director.js'
import basicURLDirector from './util/builders/url/basic.director.js'

const imageminOptions = new Map()
Expand Down Expand Up @@ -73,13 +71,6 @@
backoffHandler: (number: number, delay: number, error?: any) => void
}

export interface MWCapabilities {
apiAvailable: boolean
veApiAvailable: boolean
coordinatesAvailable: boolean
desktopRestApiAvailable: boolean
}

export const defaultStreamRequestOptions: AxiosRequestConfig = {
headers: {
accept: 'application/octet-stream',
Expand Down Expand Up @@ -114,8 +105,7 @@
private readonly urlPartCache: KVS<string> = {}
private readonly backoffOptions: BackoffOptions
private readonly optimisationCacheUrl: string
private s3: S3
private mwCapabilities: MWCapabilities // todo move to MW
private s3: S3 // todo move to MW
private apiUrlDirector: ApiURLDirector

constructor({ mw, uaString, speed, reqTimeout, optimisationCacheUrl, s3, webp, backoffOptions }: DownloaderOpts) {
Expand All @@ -128,12 +118,6 @@
this.optimisationCacheUrl = optimisationCacheUrl
this.webp = webp
this.s3 = s3
this.mwCapabilities = {
apiAvailable: false,
veApiAvailable: false,
coordinatesAvailable: true,
desktopRestApiAvailable: false,
}
this.apiUrlDirector = new ApiURLDirector(mw.apiUrl.href)

this.backoffOptions = {
Expand Down Expand Up @@ -222,14 +206,14 @@
public async setBaseUrls() {
//* Objects order in array matters!
this.baseUrl = basicURLDirector.buildDownloaderBaseUrl([
{ condition: this.mwCapabilities.desktopRestApiAvailable, value: this.mw.desktopRestApiUrl.href },
{ condition: this.mwCapabilities.veApiAvailable, value: this.mw.veApiUrl.href },
{ condition: this.mw.hasDesktopRestApi, value: this.mw.desktopRestApiUrl.href },

Check failure on line 209 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Type '(loginCookie?: string, testArticleId?: string) => Promise<any>' is not assignable to type 'boolean'.
{ condition: this.mw.hasVeApi, value: this.mw.veapiUrl.href },

Check failure on line 210 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Type '(loginCookie?: string, testArticleId?: string) => Promise<any>' is not assignable to type 'boolean'.
])

//* Objects order in array matters!
this.baseUrlForMainPage = basicURLDirector.buildDownloaderBaseUrl([
{ condition: this.mwCapabilities.desktopRestApiAvailable, value: this.mw.desktopRestApiUrl.href },
{ condition: this.mwCapabilities.veApiAvailable, value: this.mw.veApiUrl.href },
{ condition: this.mw.hasDesktopRestApi, value: this.mw.desktopRestApiUrl.href },

Check failure on line 215 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Type '(loginCookie?: string, testArticleId?: string) => Promise<any>' is not assignable to type 'boolean'.
{ condition: this.mw.hasVeApi, value: this.mw.veapiUrl.href },

Check failure on line 216 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Type '(loginCookie?: string, testArticleId?: string) => Promise<any>' is not assignable to type 'boolean'.
])

logger.log('Base Url: ', this.baseUrl)
Expand All @@ -248,29 +232,7 @@
}
}

public async checkCapabilities(testArticleId = 'MediaWiki:Sidebar'): Promise<void> {
const desktopUrlDirector = new DesktopURLDirector(this.mw.desktopRestApiUrl.href)
const visualEditorURLDirector = new VisualEditorURLDirector(this.mw.veApiUrl.href)

// By default check all API's responses and set the capabilities
// accordingly. We need to set a default page (always there because
// installed per default) to request the REST API, otherwise it would
// fail the check.
this.mwCapabilities.desktopRestApiAvailable = await this.checkApiAvailabilty(desktopUrlDirector.buildArticleURL(testArticleId))
this.mwCapabilities.veApiAvailable = await this.checkApiAvailabilty(visualEditorURLDirector.buildArticleURL(testArticleId))
this.mwCapabilities.apiAvailable = await this.checkApiAvailabilty(this.mw.apiUrl.href)

// Coordinate fetching
const reqOpts = this.getArticleQueryOpts()

const resp = await this.getJSON<MwApiResponse>(this.apiUrlDirector.buildQueryURL(reqOpts))

const isCoordinateWarning = resp.warnings && resp.warnings.query && (resp.warnings.query['*'] || '').includes('coordinates')
if (isCoordinateWarning) {
logger.info('Coordinates not available on this wiki')
this.mwCapabilities.coordinatesAvailable = false
}
}
// TODO: Update usage of public async checkCapabilities

public removeEtagWeakPrefix(etag: string): string {
return etag && etag.replace(WEAK_ETAG_REGEX, '')
Expand All @@ -288,7 +250,7 @@
const queryOpts: KVS<any> = {
...this.getArticleQueryOpts(shouldGetThumbnail, true),
titles: articleIds.join('|'),
...(this.mwCapabilities.coordinatesAvailable ? { colimit: 'max' } : {}),
...(this.mw.hasCoordinatesApi ? { colimit: 'max' } : {}),
...(this.mw.getCategories
? {
cllimit: 'max',
Expand Down Expand Up @@ -328,7 +290,7 @@
while (true) {
const queryOpts: KVS<any> = {
...this.getArticleQueryOpts(),
...(this.mwCapabilities.coordinatesAvailable ? { colimit: 'max' } : {}),
...(this.mw.hasCoordinatesApi ? { colimit: 'max' } : {}),
...(this.mw.getCategories
? {
cllimit: 'max',
Expand Down Expand Up @@ -387,11 +349,13 @@

logger.info(`Getting article [${articleId}] from ${articleApiUrl}`)

// Can retrieve not only json but html from page/html endpoint
const json = await this.getJSON<any>(articleApiUrl)
if (json.error) {
throw json.error
}
return renderArticle(json, articleId, dump, articleDetailXId, this.mwCapabilities, articleDetail)
// TODO: this.mwCapabilities should be refactored
return articleRenderer.renderArticle(json, articleId, dump, articleDetailXId, this.mwCapabilities, articleDetail)

Check failure on line 358 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Cannot find name 'articleRenderer'.

Check failure on line 358 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Property 'mwCapabilities' does not exist on type 'Downloader'.
}

public async getJSON<T>(_url: string): Promise<T> {
Expand Down Expand Up @@ -467,7 +431,8 @@
return {
action: 'query',
format: 'json',
// TODO: this.mwCapabilities should be refactored
prop: `redirects|revisions${includePageimages ? '|pageimages' : ''}${this.mwCapabilities.coordinatesAvailable ? '|coordinates' : ''}${

Check failure on line 435 in src/Downloader.ts

View workflow job for this annotation

GitHub Actions / build (18.x)

Property 'mwCapabilities' does not exist on type 'Downloader'.
this.mw.getCategories ? '|categories' : ''
}`,
rdlimit: 'max',
Expand Down
151 changes: 131 additions & 20 deletions src/MediaWiki.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,126 @@ import basicURLDirector from './util/builders/url/basic.director.js'
import BaseURLDirector from './util/builders/url/base.director.js'
import ApiURLDirector from './util/builders/url/api.director.js'

interface MWCapabilities {
apiAvailable: boolean
veApiAvailable: boolean
coordinatesAvailable: boolean
desktopRestApiAvailable: boolean
}
class MediaWiki {
public metaData: MWMetaData
public readonly baseUrl: URL
public readonly modulePath: string
public readonly webUrl: URL
public readonly apiUrl: URL
public readonly veApiUrl: URL
public readonly restApiUrl: URL
public readonly mobileRestApiUrl: URL
public readonly desktopRestApiUrl: URL
public readonly modulePathConfig
public readonly getCategories: boolean
public readonly namespaces: MWNamespaces = {}
public readonly namespacesToMirror: string[] = []

private readonly wikiPath: string
private readonly restApiPath: string
private readonly username: string
private readonly password: string
private readonly apiPath: string
private readonly domain: string
private apiUrlDirector: ApiURLDirector
private baseUrlDirector: BaseURLDirector

private _veapiUrl: URL
private _restApiUrl: URL
private _apiUrl: URL
private _modulePath: string
private _webUrl: URL
private _desktopRestApiUrl: URL
// TODO: Mobile builder was removed since there is no /mobile-sections endpoint

// Set default MW capabilities
private readonly mwCapabilities: MWCapabilities

/**
* veApiUrl based on top of 'new ApiURLDirecto'
*/
public get veapiUrl(): URL {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

veApiUrl(), restApiUrl(), etc. are shortcuts and offer lazyness to the URL directors. I see little value to have them. I suggest to just something like this.apiUrlDirector.getVisualEditorURL()` which works in a lazy manner.

If you you think this.apiUrlDirector.getVisualEditorURL() is really too cumbersome, then OK to just keep the redirection part, but I see no reason why the lazyness is not built in the Director itself considering that the configuration of it happens at construction time.

if (!this._veapiUrl) {
// TODO: This depend on baseUrlDirector.buildURL(this.apiPath) and looks like a weak solution
this._veapiUrl = this.apiUrlDirector.buildVisualEditorURL()
}
return this._veapiUrl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easy, considering that you have so many different styles and developers behind the codebase... but think about:

  • Url vs URL
  • VE vs VisualEditor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like that we have a dedicated ticket about naming conventions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For VE, the answer is in the validation of https://github.com/openzim/mwoffliner/wiki/API-end%E2%80%90points. For URL, I clearly prefer to keep full uppercase for constants but up to you.

Copy link
Collaborator Author

@VadimKovalenkoSNF VadimKovalenkoSNF Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed ve to visualEditor, but regarding Url, I'd rather keep it as is because there are about ~580 entries across mwoffliner that use the same capitalization. Check here.

}

/**
* restApiUrl, apiUrl, modulePath, webUrl and desktopRestApiUr are based on top of 'new BaseURLDirector'
*/
public get restApiUrl(): URL {
if (!this._restApiUrl) {
this._restApiUrl = this.baseUrlDirector.buildRestApiURL(this.restApiPath)
}
// TODO: define usage of this property
return this._restApiUrl
}

public get apiUrl(): URL {
if (!this._apiUrl) {
this._apiUrl = this.baseUrlDirector.buildURL(this.apiPath)
}
return this._apiUrl
}

public get modulePath() {
if (!this._modulePath) {
this._modulePath = this.baseUrlDirector.buildModuleURL(this.modulePathConfig)
}
return this._modulePath
}

public get webUrl(): URL {
if (!this._webUrl) {
this._webUrl = this.baseUrlDirector.buildURL(this.wikiPath)
}
return this._webUrl
}

public get desktopRestApiUrl(): URL {
if (!this._desktopRestApiUrl) {
this._desktopRestApiUrl = this.baseUrlDirector.buildDesktopRestApiURL(this.restApiPath)
}
return this._desktopRestApiUrl
}

public hasDesktopRestApi = async function (loginCookie?: string, testArticleId?: string): Promise<any> {
const desktopRestApiAvailable = await this.checkApiAvailabilty(this.getDesktopRestApiArticleUrl(testArticleId), loginCookie)
this.hasDesktopRestApi = async function (): Promise<boolean> {
return desktopRestApiAvailable
}
}

public hasVeApi = async function (loginCookie?: string, testArticleId?: string): Promise<any> {
const veRestApiAvailable = await this.checkApiAvailabilty(this.getVeApiArticleUrl(testArticleId), loginCookie)
this.hasVeApi = async function (): Promise<boolean> {
return veRestApiAvailable
}
}

public hasCoordinatesApi = async function (downloader?: Downloader): Promise<any> {
const validNamespaceIds = this.namespacesToMirror.map((ns) => this.namespaces[ns].num)
const reqOpts = {
action: 'query',
format: 'json',
// TODO: Do we need this.mwCapabilities.coordinatesAvailable here?
prop: `redirects|revisions${this.mwCapabilities.coordinatesAvailable ? '|coordinates' : ''}${this.getCategories ? '|categories' : ''}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here there is a chicken-egg problem, not sure how this happened, but you clearly need to put |coordinates to then see if this is supported.

rdlimit: 'max',
rdnamespace: validNamespaceIds.join('|'),
}

// TODO: replace|rename|refactor getJSON later
if (downloader) {
const resp = await downloader.getJSON<MwApiResponse>(this.apiUrlDirector.buildQueryURL(reqOpts))
const isCoordinateWarning = resp.warnings && resp.warnings.query && (resp.warnings.query['*'] || '').includes('coordinates')
if (isCoordinateWarning) {
logger.info('Coordinates not available on this wiki')
return false
}
}
return true
}

constructor(config: MWConfig) {
this.domain = config.domain || ''
Expand All @@ -39,23 +139,23 @@ class MediaWiki {
this.getCategories = config.getCategories

this.baseUrl = basicURLDirector.buildMediawikiBaseURL(config.base)

this.apiPath = config.apiPath ?? 'w/api.php'
this.wikiPath = config.wikiPath ?? DEFAULT_WIKI_PATH
this.restApiPath = config.restApiPath
this.modulePathConfig = config.modulePath

const baseUrlDirector = new BaseURLDirector(this.baseUrl.href)

this.webUrl = baseUrlDirector.buildURL(this.wikiPath)
this.apiUrl = baseUrlDirector.buildURL(this.apiPath)

// Instantiate Url Directors
this.baseUrlDirector = new BaseURLDirector(this.baseUrl.href)
this.apiUrlDirector = new ApiURLDirector(this.apiUrl.href)

this.veApiUrl = this.apiUrlDirector.buildVisualEditorURL()

this.restApiUrl = baseUrlDirector.buildRestApiURL(config.restApiPath)
this.desktopRestApiUrl = baseUrlDirector.buildDesktopRestApiURL(config.restApiPath)

this.modulePath = baseUrlDirector.buildModuleURL(config.modulePath)
// Default capabilities
// TODO: check whether to remove this object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are almost done and ready to get rid of this mwCapabilites.

this.mwCapabilities = {
apiAvailable: false,
veApiAvailable: false,
coordinatesAvailable: true,
desktopRestApiAvailable: false,
}
}

public async login(downloader: Downloader) {
Expand Down Expand Up @@ -85,12 +185,17 @@ class MediaWiki {
},
method: 'POST',
})
.then((resp) => {
.then(async (resp) => {
if (resp.data.login.result !== 'Success') {
throw new Error('Login Failed')
}

/*
TODO: Cookie is shared between Downloader and Mediawiki, probably antipattern. Use as interim solution for now.
Also, double-check possible race condition - cookies should be set before checking capabilities.
*/
downloader.loginCookie = resp.headers['set-cookie'].join(';')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indee... here would help to clearly define what is the duty of the Mediawiki and Downloader class. Once this is clear, easier to refactor.

await this.checkCapabilities(resp.headers['set-cookie'].join(';'))
})
.catch((err) => {
throw err
Expand Down Expand Up @@ -296,6 +401,12 @@ class MediaWiki {

return mwMetaData
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments have been lost here, maybe on purpose?! Just try to keep things commented when needed.

private async checkCapabilities(loginCookie?: string, testArticleId = 'MediaWiki:Sidebar'): Promise<void> {
await this.hasDesktopRestApi(loginCookie, testArticleId)
await this.hasVeApi(loginCookie, testArticleId)
await this.hasCoordinatesApi()
}
}

export default MediaWiki
4 changes: 3 additions & 1 deletion src/mwoffliner.lib.ts
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,9 @@ async function execute(argv: any) {
}
}

await downloader.checkCapabilities(mwMetaData.mainPage)
// TODO: checkCapabilities is now in Mediawiki, do we need it here?
// await downloader.checkCapabilities(mwMetaData.mainPage)

await downloader.setBaseUrls()

const redisStore = new RedisStore(argv.redis || config.defaults.redisPath)
Expand Down
10 changes: 10 additions & 0 deletions src/util/mw-api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import deepmerge from 'deepmerge'
import * as logger from '../Logger.js'
import Downloader from '../Downloader.js'
import Timer from './Timer.js'
import axios from 'axios'

export async function getArticlesByIds(articleIds: string[], downloader: Downloader, redisStore: RS, log = true): Promise<void> {
let from = 0
Expand Down Expand Up @@ -253,3 +254,12 @@ export function mwRetToArticleDetail(obj: QueryMwRet): KVS<ArticleDetail> {
}
return ret
}

Copy link
Collaborator

@kelson42 kelson42 Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment about what the purpose of this? At a first look, I can not remember... or is this totally new? Sounds a bit strange to have it here... but here again, not sure about the really purpose of mw-api.ts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced checkApiAvailabilty() method from Downloader because it is more like a utility function.

export async function checkApiAvailabilty(url: string, loginCookie = ''): Promise<boolean> {
try {
const resp = await axios.get(url, { maxRedirects: 0, headers: { cookie: loginCookie } })
return resp.status === 200 && !resp.headers['mediawiki-api-error']
} catch (err) {
return false
}
}
Loading
Loading