-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job Scheduler: Performance Improvements #5204
Conversation
internal/db/schema/migrations/oss/postgres/93/01_job_run_clean.up.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good left a couple comments
Mike merged a fairly extensive PR related to jobs so this is now also conflicting |
Yep - I've resolved the conflicts locally and will update this PR once everyone has reviewed the fixup changes |
9028a1e
to
cbcd514
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One style comment otherwise this looks great
Before this change, upon the completion of a job run, the scheduler would update a job_run entry with the completed status as well as the number of objects affected by the run. Additionally, we then had a cleaner job whose scope was deleting these completed entries from job_run. This commit changes this design to drop the cleaner job the and rolls its functionality directly into the scheduler. Now, when any given job run is complete, the scheduler will automatically delete it from the job_run table itself. In addition, the completed status has been removed as a valid status as it is not set anymore. This change follows a decision that given the data in these job_run entries is already being deleted by the cleaner job, updating its state when a job completes just to have it deleted is not efficient. Additionally, no use-case for this data was surfaced.
77bec14
to
27c432a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @tmessi should be back tomorrow so might be good to wait for him to 👍 since he did have some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, can you move it out of draft so the schema-diff workflow runs? It would be nice to review the schema diff just to be sure.
Yep, done! I will also add the complete query change to this PR, likely later today |
This comment has been minimized.
This comment has been minimized.
This commit changes the logic around selecting jobs to run. Before this, Boundary used a view "job_jobs_to_run" which was a collation of all jobs that were not running and scheduled to run. When jobs were requested to run, Boundary would look at this view, grab a certain amount of jobs and insert them into the "job_run" table. Importantly, while the amount of jobs to be scheduled at once defaulted to 1 in the scheduler package, this was entirely overriden in the controller package to -1 (no limit, run all jobs that the query would return). With this commit, Boundary now looks at the "job" table directly for jobs to run, regardless of job run status. This was done to improve DB performance as the view (which queried all "job_run" rows) was causing significant DB contention. Additionally, the option to limit job runs is now removed since it was already hardcoded to -1 and not user-configurable anyways. Should the need arise to implement this limit again, this will have to be done in a different way since the new query does not account for running jobs and limiting the rows returned after jobs are already running would yield a no-op (the new query inserts nothing into "job_run" if any given job is already running).
@tmessi @louisruch Pushed up the changes to the job scheduling logic: f227020 |
Database schema diff between To understand how these diffs are generated and some limitations see the FunctionsUnchanged Tablesdiff --git a/.schema-diff/tables_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run.sql b/.schema-diff/tables_ad03e96532d85de63832a78057630a11189b99d2/job_run.sql
index 75158503f..17356053f 100644
--- a/.schema-diff/tables_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run.sql
+++ b/.schema-diff/tables_ad03e96532d85de63832a78057630a11189b99d2/job_run.sql
@@ -47,7 +47,7 @@ create table public.job_run (
-- name: table job_run; type: comment; schema: public; owner: -
--
-comment on table public.job_run is 'job_run is a table where each row represents an instance of a job run that is either actively running or has already completed.';
+comment on table public.job_run is 'job_run is a table where each row represents an instance of a job run that is either actively running or has failed in some way.';
--
diff --git a/.schema-diff/tables_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run_status_enm.sql b/.schema-diff/tables_ad03e96532d85de63832a78057630a11189b99d2/job_run_status_enm.sql
index 613a5e666..49bc07ece 100644
--- a/.schema-diff/tables_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run_status_enm.sql
+++ b/.schema-diff/tables_ad03e96532d85de63832a78057630a11189b99d2/job_run_status_enm.sql
@@ -26,7 +26,7 @@ set default_table_access_method = heap;
create table public.job_run_status_enm (
name text not null,
- constraint only_predefined_job_status_allowed check ((name = any (array['running'::text, 'completed'::text, 'failed'::text, 'interrupted'::text])))
+ constraint only_predefined_job_status_allowed check ((name = any (array['running'::text, 'failed'::text, 'interrupted'::text])))
);
Viewsdiff --git a/.schema-diff/views_62c41191ffce6ab68fbed810559dd857fa8a2528/job_jobs_to_run.sql b/.schema-diff/views_62c41191ffce6ab68fbed810559dd857fa8a2528/job_jobs_to_run.sql
deleted file mode 100644
index 56f6ed5e6..000000000
--- a/.schema-diff/views_62c41191ffce6ab68fbed810559dd857fa8a2528/job_jobs_to_run.sql
+++ /dev/null
@@ -1,47 +0,0 @@
---
--- postgresql database dump
---
-
--- dumped from database version 13.16
--- dumped by pg_dump version 14.13 (ubuntu 14.13-1.pgdg22.04+1)
-
-set statement_timeout = 0;
-set lock_timeout = 0;
-set idle_in_transaction_session_timeout = 0;
-set client_encoding = 'utf8';
-set standard_conforming_strings = on;
-select pg_catalog.set_config('search_path', '', false);
-set check_function_bodies = false;
-set xmloption = content;
-set client_min_messages = warning;
-set row_security = off;
-
---
--- name: job_jobs_to_run; type: view; schema: public; owner: -
---
-
-create view public.job_jobs_to_run as
- with running_jobs(job_plugin_id, job_name) as (
- select job_run.job_plugin_id,
- job_run.job_name
- from public.job_run
- where (job_run.status = 'running'::text)
- ), final(job_plugin_id, job_name, next_scheduled_run) as (
- select j.plugin_id,
- j.name,
- j.next_scheduled_run
- from public.job j
- where ((j.next_scheduled_run <= current_timestamp) and (not (exists ( select
- from running_jobs
- where (((running_jobs.job_plugin_id)::text = (j.plugin_id)::text) and ((running_jobs.job_name)::text = (j.name)::text))))))
- )
- select final.job_plugin_id,
- final.job_name,
- final.next_scheduled_run
- from final;
-
-
---
--- postgresql database dump complete
---
- TriggersUnchanged Indexesdiff --git a/.schema-diff/indexes_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run_status_ix.sql b/.schema-diff/indexes_ad03e96532d85de63832a78057630a11189b99d2/job_run_status_ix.sql
index 09ab0c10d..0ae4fc847 100644
--- a/.schema-diff/indexes_62c41191ffce6ab68fbed810559dd857fa8a2528/job_run_status_ix.sql
+++ b/.schema-diff/indexes_ad03e96532d85de63832a78057630a11189b99d2/job_run_status_ix.sql
@@ -29,7 +29,7 @@ create index job_run_status_ix on public.job_run using btree (status);
-- name: index job_run_status_ix; type: comment; schema: public; owner: -
--
-comment on index public.job_run_status_ix is 'the job_run_status_ix is used by the job run cleaner job';
+comment on index public.job_run_status_ix is 'the job_run_status_ix indexes the commonly-used status field';
-- ConstraintsUnchanged Foreign Key ConstraintsUnchanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR contains work pertaining to improving database operation performance on the job scheduler:
RunJobs
query to not use thejob_jobs_to_run
view as its definition was causing database contention in thejob_run
table