Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run transferbot in a Kubernetes Job #830

Merged
merged 15 commits into from
Jul 30, 2024
Merged

Conversation

m90
Copy link
Contributor

@m90 m90 commented Jul 3, 2024

@m90 m90 force-pushed the fr/spawn-transferbot branch 2 times, most recently from 2bc1ccc to 5442ba3 Compare July 3, 2024 15:06
@m90 m90 force-pushed the feature/data-reuse-tool branch from bb9285e to b1711ef Compare July 16, 2024 08:03
@m90 m90 force-pushed the fr/spawn-transferbot branch from 5442ba3 to 600e9c2 Compare July 16, 2024 08:05
@m90 m90 force-pushed the fr/spawn-transferbot branch from 007400b to 8cbfe9d Compare July 16, 2024 11:52
@m90 m90 marked this pull request as ready for review July 22, 2024 09:50
$import = WikiEntityImport::findOrFail($this->importId);
$creds = $this->acquireCredentials($wiki->domain);

$this->targetWikiUrl = str_contains($wiki->domain, "localhost")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly. I'd be happy to learn about a cleaner solution for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only suggestion I have is we revisit the behaviour of the TransferBot image again and do something like the sidecar proxy thing and always talk to that over http.

I agree this is a nasty leaking of knowing the details about the network environment of the job container into the api.

We could at least tighten it by parsing the url and checking it's the TLD of the FQDN. That would save us having some mystery errors in the future for localhostingofdinnerparties.wikibase.cloud which this would inadvertently catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think parsing the domain and then checking its TLD is a good compromise.

'consumerName' => 'WikiEntityImportJob',
'ownerOnly' => '1',
'consumerVersion' => '1',
'grants' => 'basic|highvolume|import|editpage|editprotected|createeditmovepage|uploadfile|uploadeditmovefile|rollback|delete|mergehistory',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure these could be reduced even further, but I am totally lost when debugging and was happy I found a working combo that is not "every permission ever".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised most of these are needed. I'm trying to think if there is a quick way for us to test these

app/Jobs/WikiEntityImportJob.php Outdated Show resolved Hide resolved
'containers' => [
0 => [
'hostNetwork' => true,
'name' => 'run-qs-updater',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably change this from the copy paste.

Indeed inf we are looking for a name I'd be keen on EntityImport and moving away from transferbot (some random chat thread we had one month ago: https://mattermost.wikimedia.de/swe/pl/4kao4tr49inobd4i45iuz8cqqo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the typo, my reasoning for using transferbot wording here was that this is the name of the image we're running. But maybe ditching this in favor of entity-import is better, yes.

@m90 m90 force-pushed the fr/spawn-transferbot branch from 5497376 to f6328a1 Compare July 23, 2024 14:23
@m90 m90 merged commit c3d0fb1 into feature/data-reuse-tool Jul 30, 2024
5 checks passed
@m90 m90 deleted the fr/spawn-transferbot branch July 30, 2024 05:38
m90 added a commit that referenced this pull request Jul 30, 2024
* feat: scaffold entity import job

* feat: fetch credentials from mediawiki

* feat: scaffold control flow for supervising job

* feat: add parameterized spec

* chore: connect proper job to http layer

* fix: adjust kubernetes job spec to pass

* feat: add callback handling

* refactor: move enum to dedicated file

* fix: adjust oauth grants so they work with platform user

* fix: tighten validation for domain parameter

* test: add test case for import job

* chore: change job configuration values

* chore: lock transferbot image version

* refactor: address review feedback

* chore: bump transferbot image version
m90 added a commit that referenced this pull request Jul 30, 2024
* Add API endpoints for handling entity imports (#823)

* feat: scaffold model for storing data about transferbot runs

* feat: scaffold controller logic

* fix: wiki access needs to be scoped down to users

* fix: use consistent plural form for table name

* feat: dispatch mock job on creation

* test: scaffold test setup

* test: cover further functionality

* fix: use proper controller reference

* fix: store entire data

* fix: only update pending imports

* chore: import is not used

* chore: rename dummy job

* feat: add input validation

* refactor: handle enum validation through laravel features

* feat: add job for marking stalled entity imports as stalled (#833)

* feat: tighten validation on entity parameters (#838)

* Run `transferbot` in a Kubernetes Job (#830)

* feat: scaffold entity import job

* feat: fetch credentials from mediawiki

* feat: scaffold control flow for supervising job

* feat: add parameterized spec

* chore: connect proper job to http layer

* fix: adjust kubernetes job spec to pass

* feat: add callback handling

* refactor: move enum to dedicated file

* fix: adjust oauth grants so they work with platform user

* fix: tighten validation for domain parameter

* test: add test case for import job

* chore: change job configuration values

* chore: lock transferbot image version

* refactor: address review feedback

* chore: bump transferbot image version

* docs: add CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants