Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: why varchar instead of enum for the job states ? #134

Open
npf opened this issue Mar 31, 2017 · 4 comments
Open

Postgres: why varchar instead of enum for the job states ? #134

npf opened this issue Mar 31, 2017 · 4 comments
Assignees

Comments

@npf
Copy link
Contributor

npf commented Mar 31, 2017

Cf: https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=7970

Quoting @lnussbaum:

Quand on fait:
curl -kn https://api.grid5000.fr/3.0/sites/luxembourg/status

L'API ne fait que deux requêtes:
(A) SELECT resource_id, cluster, network_address, core, state, available_upto, comment FROM "resources" WHERE "resources"."cluster" IN ('granduc', 'petitprince');

(B) SELECT jobs.*, moldable_job_descriptions.moldable_walltime AS walltime, gantt_jobs_predictions.start_time AS predicted_start_time,  moldable_job_descriptions.moldable_id
 FROM "jobs"
 LEFT OUTER JOIN moldable_job_descriptions ON jobs.job_id = moldable_job_descriptions.moldable_job_id
 LEFT OUTER JOIN gantt_jobs_predictions ON gantt_jobs_predictions.moldable_job_id = moldable_job_descriptions.moldable_id
 WHERE (state NOT IN ('Terminated', 'Error'))

Au luxembourg:

(A) est simple et court. le plan d'exécution est:
oar2=# explain analyze SELECT resource_id, cluster, network_address, core, state, available_upto, comment FROM "resources" WHERE "resources"."cluster" IN ('granduc', 'petitprince')
;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Seq Scan on resources  (cost=0.00..486.59 rows=468 width=78) (actual time=0.034..0.835 rows=368 loops=1)
   Filter: ((cluster)::text = ANY ('{granduc,petitprince}'::text[]))
 Total runtime: 1.133 ms
(3 rows)


C'est un seq scan, mais la table est courte. on pourrait éventuellement ajouter
un index sur cluster, mais je ne suis même pas sûr qu'il serait utilisé.


(B) est aussi assez simple et court.
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=2.06..13608.04 rows=24188 width=555) (actual time=53.522..53.522 rows=0 loops=1)
   Hash Cond: (moldable_job_descriptions.moldable_id = gantt_jobs_predictions.moldable_job_id)
   ->  Merge Left Join  (cost=0.03..13515.31 rows=24188 width=551) (actual time=53.520..53.520 rows=0 loops=1)
         Merge Cond: (jobs.job_id = moldable_job_descriptions.moldable_job_id)
         ->  Index Scan using jobs_pkey on jobs  (cost=0.00..9824.67 rows=24183 width=543) (actual time=53.516..53.516 rows=0 loops=1)
               Filter: ((state)::text <> ALL ('{Terminated,Error}'::text[]))
         ->  Index Scan using ix_moldable_job_descriptions_moldable_job_id on moldable_job_descriptions  (cost=0.00..3143.89 rows=97764 width=12) (never executed)
   ->  Hash  (cost=2.01..2.01 rows=1 width=8) (never executed)
         ->  Seq Scan on gantt_jobs_predictions  (cost=0.00..2.01 rows=1 width=8) (never executed)
 Total runtime: 53.601 ms
(10 rows)


A nancy:
(A) est tout aussi court.

(B) est nettement plus long (2.6s d'exécution) car il y a des jobs en attente. Le query plan:

 Hash Left Join  (cost=184439.15..255459.81 rows=282761 width=948) (actual time=2642.475..2686.547 rows=79 loops=1)
   Hash Cond: (moldable_job_descriptions.moldable_id = gantt_jobs_predictions.moldable_job_id)
   ->  Hash Right Join  (cost=184434.35..254394.46 rows=282761 width=944) (actual time=2642.278..2686.157 rows=79 loops=1)
         Hash Cond: (moldable_job_descriptions.moldable_job_id = jobs.job_id)
         ->  Seq Scan on moldable_job_descriptions  (cost=0.00..18262.38 rows=1165638 width=12) (actual time=0.006..1009.600 rows=1189757 loops=1)
         ->  Hash  (cost=147762.84..147762.84 rows=282761 width=936) (actual time=480.263..480.263 rows=79 loops=1)
               Buckets: 1024  Batches: 512  Memory Usage: 4kB
               ->  Seq Scan on jobs  (cost=0.00..147762.84 rows=282761 width=936) (actual time=203.384..478.162 rows=79 loops=1)
                     Filter: ((state)::text <> ALL ('{Terminated,Error}'::text[]))
   ->  Hash  (cost=3.80..3.80 rows=80 width=8) (actual time=0.153..0.153 rows=80 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 4kB
         ->  Seq Scan on gantt_jobs_predictions  (cost=0.00..3.80 rows=80 width=8) (actual time=0.008..0.078 rows=80 loops=1)
 Total runtime: 2687.080 ms


le problème est que dans OAR, state est déclaré comme:
  state varchar(16) check (state in ('Waiting','Hold','toLaunch','toError','toAckReservation','Launching','Running','Suspended','Resuming','Finishing','Terminated','Error')) NOT NULL default 'Waiting',

mais postgresql n'utilise pas la contrainte qui limite les valeurs pour résoudre
"(state NOT IN ('Terminated', 'Error'))".

S'il était un peu intelligent, il se dirait:
"on ne veut pas "Terminated', 'Error', donc on veut forcément tous les autres, donc je peux utiliser mon index pour récupérer uniquement les autres valeurs, car grâce à mes stats sur le contenu de l'index, je sais que ça sera nettement plus efficace (il y a surtout des jobs Terminated/Error dans jobs)."

Mais comme state est déclaré comme un varchar, il ne le fait pas.

On peut résoudre le problème:
- côté OAR, en changeant le type de state pour que ça devienne un ENUM (cf https://www.postgresql.org/docs/9.1/static/datatype-enum.html )
- coté API, en changeant la requête pour énumérer tous les cas:

SELECT jobs.*, moldable_job_descriptions.moldable_walltime AS walltime, gantt_jobs_predictions.start_time AS predicted_start_time,  moldable_job_descriptions.moldable_id
 FROM "jobs"
 LEFT OUTER JOIN moldable_job_descriptions ON jobs.job_id = moldable_job_descriptions.moldable_job_id
 LEFT OUTER JOIN gantt_jobs_predictions ON gantt_jobs_predictions.moldable_job_id = moldable_job_descriptions.moldable_id
 WHERE (state IN ('Waiting','Hold','toLaunch','toError','toAckReservation','Launching','Running','Suspended','Resuming','Finishing'));

avec cette requête, je passe à ce query plan:

                                                                                      QUERY PLAN                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=59.66..7043.78 rows=588 width=947) (actual time=0.743..4.012 rows=237 loops=1)
   Hash Cond: (moldable_job_descriptions.moldable_id = gantt_jobs_predictions.moldable_job_id)
   ->  Nested Loop Left Join  (cost=47.35..7029.25 rows=588 width=943) (actual time=0.312..3.132 rows=237 loops=1)
         ->  Bitmap Heap Scan on jobs  (cost=47.35..2291.62 rows=588 width=935) (actual time=0.297..1.279 rows=237 loops=1)
               Recheck Cond: ((state)::text = ANY ('{Waiting,Hold,toLaunch,toError,toAckReservation,Launching,Running,Suspended,Resuming,Finishing}'::text[]))
               ->  Bitmap Index Scan on ix_jobs_state  (cost=0.00..47.20 rows=588 width=0) (actual time=0.183..0.183 rows=873 loops=1)
                     Index Cond: ((state)::text = ANY ('{Waiting,Hold,toLaunch,toError,toAckReservation,Launching,Running,Suspended,Resuming,Finishing}'::text[]))
         ->  Index Scan using ix_moldable_job_descriptions_moldable_job_id on moldable_job_descriptions  (cost=0.00..8.04 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=237)
               Index Cond: (jobs.job_id = moldable_job_id)
   ->  Hash  (cost=9.36..9.36 rows=236 width=8) (actual time=0.416..0.416 rows=234 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 10kB
         ->  Seq Scan on gantt_jobs_predictions  (cost=0.00..9.36 rows=236 width=8) (actual time=0.005..0.218 rows=234 loops=1)
 Total runtime: 4.258 ms
(13 rows)

ça permet de passer de 2724.273ms à 4.278ms, ce qui est un bon speedup ;)
@npf npf changed the title Postgres: why varchar instead of enum for the job stats ? Postgres: why varchar instead of enum for the job states ? Mar 31, 2017
@lnussbaum
Copy link

en fait, je pensais que passer à un ENUM suffirait à régler le problème, mais non. Je vais creuser du côté de postgresql pourquoi il se comporte mal dans ce cas.

Du coup, le fait de passer par des ENUM est plus cosmétique.

@capitn
Copy link
Contributor

capitn commented Mar 31, 2017 via email

@npf
Copy link
Contributor Author

npf commented Apr 2, 2017

Le type enum est peut-être apparu avec les versions récentes de Postgresql ?

@npf
Copy link
Contributor Author

npf commented Apr 2, 2017

Apparu dans 8.3 : https://www.postgresql.org/docs/8.3/static/release-8-3.html

-> dans Debian Lenny en Mars 2009 (https://tracker.debian.org/news/643822), c'est donc bien ça !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants