forked from oar-team/oar
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
1037 lines (950 loc) · 51.8 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
OAR CHANGELOG
=============
version 2.5.10:
--------------
- [oarstat] Allow to see other users' initial_request
- [oarsub] ensure the inline job key is in a valid format, by adding \n
- [oarsub] remove ssh inline keys from initial_request
- [scheduler] add the SCHEDULER_RESOURCE_ORDER_ADV_RESERVATIONS_THRESHOLD
option: advance reservations in a near future can use the same resources
order as batch job, taking into account the standby and besteffort jobs)
- [oarstat] Add a machine parseable output for -p option
- [oarstat] Add a machine parseable output for -e (events) option
- [oarstat] Exit with an error if -p and --sql options are used together
- [oarnodes] Factorize SQL query used to get resources
- [oarnodes] Factorize SQL query to get nodes' events
- [oarnodes] Optimize OAR's resources querying (option -r)
- [oarnodes] Optimize nodes' states querying
- [oarnodes] Update usage and man page
- [oarsub] add a reservation end time possibility to -r option
- [oarsub] factorize the reservation parsing in interactive condition
- [oarsub] be more flexible about the reservation dates parsing
- [oarsub] add the 'now' keyword support to reservation request
- [man] rework oarsub man page
- [oarsub] update usage
- [oarnodesetting] allow for unsetting the value of resource properties
- [oarsub] rename OARSUB_NODE_EXEC_FILE to INTERACTIVE_JOB_HOOK_EXEC_FILE
- [oarexec] add PASSIVE_JOB_HOOK_EXEC_FILE
- [NodeChangeState] fix DB transaction error when resubmitting a job
- [NodeChangeState] do not resubmit jobs for deploy/cosystem and if
server_prologue error
- [NodeChangeState] on prologue/epilogue error, only set the first node to
suspected
- [NodeChangeState] Do not suspect nodes for prologue/epilogue error of
deploy/cosystem jobs
- [oarsub] add the advance reservation validation hook feature
- [oarnode] fix oarnode --sql if not resources to show
- [Hulot] Add debug info regarding timeouts
- [sarko] fix debug message for job frag
- [accounting] change the behavior of accounting to be more efficient
- [leon] optimize SQL query of get_to_kill_jobs subroutine
- [MetaSched] Optimize querying of the last wake up for a list of node
- [IO.pm] Add constants to store often used OAR's job states
- [Hulot] Fix unknown method oar_warning
- [job_resource_manager] add support for AMD gpus
- [oarsh] add support of AMD gpus
- [oardodo] parametrize the OOM killer setup of oardodo
- [NodeChangeState] factorize the resources state changing
- [NodeChangeState] correctly lock and make transaction with PostgreSQL
- [IO.pm] factorize jobs fragging made by NodeChangeState
- [IO.pm] Add an lock_table_exclusive subroutine, for postgresql
- [oarnodes] fix machine parseables output by returning an empty set
- [oarnodes] fix uninitialized value with -r option
- [oarnodes] fix issue when using non existing nodes in arguments
version 2.5.9:
--------------
- [scheduler] add the SCHEDULER_RESOURCE_ORDER_ADV_RESERVATIONS option, so
that the scheduling of advance reservations is not impacted by the current
state of the resources (e.g. nodes in standby, current besteffort jobs)
- [admission rules] add an admission rule to restrict advance reservation
inner jobs to use container jobs that are advance reservations as well
- [schedulers] fix issues with the scheduling of a inner job before its
container
- [schedulers] waiting inner job in container that vanished are set to error
- [oarexec] add an option to have inner jobs killed along with their container
- [oarexec] do not run inner jobs before their container is already running
- [oarwalltime] make walltime change respect a possibly defined job deadline
- [oarwalltime] add an option to disable the walltime reduction
- [oarwalltime] fix oarwalltime the per queue configuration
- [oarsub/scheduler] fix a bug with the recent Perl max recursion depth limit
- [drawgantt] show the timezone in the dates
- [oarexec] fix oarsub shell termination when the job is killed
- [database] add an index to the resource_log table
- [oar_resource_add] add support for reparing the resource properties
- [oarexec] add support to disable the auto-repair of suspected nodes
- [job_resource_manager/oarsh] add the COMPUTE_THREAD_SIBLINGS option, to let
OAR automatically set the HT thread siblings if not set in the resources
hierarchy with a thread resource, or in the resource cpuset field
- [job_resource_manager] rework code, support more cgroup subsystems
- [oarsh] add support to let oarsh create a sub cgroup with either a subset of
the cpuset or of the devices in the shell opened on the node. See an
example of usage with GNU Parallel in the website documentation
- [oarsub] add the OARSUB_NODE_EXEC_FILE configuration to run a custom command
on the head node of the job before the job shell
- [oarsub] make oarsub accept the submission of a noop job with no script
- [oarstat] fix JSON/YAML/XML output when no job to display
- [oarstat] oarstat -j can now use the OAR_JOB_ID environment variable
- [oarstat] fix YAML display with the YAML::Syck library
version 2.5.8:
--------------
- [job_resource_manager] manage nvidia gpu with the cgroup devices
- [oarwalltime] add functionality to allow changing the walltime of a
running job. See the oarwalltime command and oar.conf
- [scheduler] fix the besteffort + deploy VS adv. reservation case
- [scheduler] add the state=permissive job type, allowing jobs to be scheduled
and run (if noop or cosystem as well only) regardless of the aliveness of
resources
- [oarsub/scheduler] fix warning "Use of uninitialized value $resource_value"
- [oarsub] fix unknown error message in case of job termination + typos
- [oarnodesetting] do not kill noop jobs using by resources changed to
Dead or Absent
- [finaud] fix: make pingchecker run only on resources of type default
- [oar-database] fix the privilegies of oar's read only user in PostgreSQL
in new installation. For existing database, the following command apply the
fix: `oar-database --fix-ro-user-priv ...`
- [api] some improvement in the Apache configuration and tests
- [api] added POST /media/force to overwrite a file
- [finaud] bugfix: make pingchecker run only on resources of type default
- [api] hardening on the syntax of the URIs (should not impact good URIs!)
- [drawgantt-svg] add a mark next to the label of the resources pointed by
the mouse
- [drawgantt-svg] fix possible SQL injection with the filters
- [drawgantt-svg] improve the label_display_regex text replacement mechanism
- [drawgantt-svg/oarstat] fix past and current moldable jobs display
- [drawgantt-svg] fix drain display
- [drawgantt-svg] fix nav_filter with only one option
- [oar.conf] update SSH options to the one of OpenSSH 7.6p1
- [oar-database] support --db-is-local (UNIX socket) for MySQL (MariaDB)
- [oar-node] fix warnings with OAR's sshd configuration
- [oar-resource-add] fix the auto-offset option
- [oar-resource-add] add support for creating GPU resources
- [oar-resource-add] add support to handle the CPU and GPU topologies
version 2.5.7:
--------------
This version mainly brings a security fix for the oarsh command. It is highly
recommended to upgrade (server, frontend(s) and nodes), since all previous
versions of OAR are affected.
- [oarsh] fix a security hole when passing option to OpenSSH. See oar.conf to
adapt settings to your setup, if required (OARSH_* variables)
- [oarsh] dropped the mechanism to select whether to use oarsh or fall back
to ssh, given a list of hostname patterns
- [oarsub] fix the job-key information of the manual page
- [oarsub] handle cases where trailing spaces were breaking oarsub script directives
- [api] added an example of Apache configuration for the authentication
- [documentation] improve the SSH keys setup explanations for OAR installation
version 2.5.6:
--------------
- [oar.conf] add the SCHEDULER_MIN_TIME_BETWEEN_2_CALLS option
- [metascheduler] fix a bug with advance reservations when predicted resources
must be recomputed
- [metascheduler] fix a bug with advance reservations with standby start job
types (noop/cosystem/deploy=standby)
- [oar-node init] create /var/run/sshd if needed
- [oarsub] fix several bugs with the array job submission
- [oarstat] allow using Perl's YAML::Syck for a quicker YAML output
- [oarstat] improve performance and information for the --gantt option
- [oarstat] prettier print of job events
- [oarnodesetting] optimize grouped operations on resources and add a lock
around property changes
- [oaradmissionrules] fix bug: changing a rule priority does not enable it
- [oar_resources_init] fix node read from standard input
- [oarnodecheck] use /var/lib/oar instead of /etc/oar for working files
- [logs] several cosmetic fixes
- [api] add colmet extraction function
- [api] proposed apache configuration now uses a virtual host on port 6668
- [drawgantt-svg] fix the possibly very long delay when zooming
- [drawgantt-svg] add forecast buttons + relative start/stop url arguments
- [drawgantt-svg] rework configuration for the default display
- [drawgantt-svg] allow displaying resources of type != default
- [drawgantt-svg] improve support for use as a widget in custom HTML pages
(multisite, etc)
- [monika] fix bugs with recent Perl/Perl CGI versions
- [monika] fix harmless bug in configuration
- [visualization] remove overlib.js (license issue), this breaks the legacy
drawgantt (which is not supported anymore)
- [misc] remove some old development codes from sources
- [misc] fix inconsistent copyrights and licenses
- [doc] update the installation documentation
version 2.5.5:
--------------
- [iolib] fix deadlock with TRUNCATE in postgresql
- [almighty] add SCHEDULER_MIN_TIME_BETWEEN_2_CALLS:
the scheduler is launched at max every t seconds (t=5 by default), this
avoids the scheduler to cause starvation with regard to the other
modules
- [scheduler] fix some memory leaks.
- [scheduler] add a cache to the resources tree computation: improve
the scheduler speed by reducing the number of SQL queries.
- [scheduler] backport the expire/postpone/deadline job types.
- [scheduler] rename the placeholder job types: placeholder/allowed.
- [scheduler] fix timesharing (adv reservation and *_placeholder schedulers).
- [scheduler] allows noop/cosystem/deploy jobs to start on resources in
standby, no wake-up is triggered (requires activating energy saving).
- [oarsub] use jobkey (-k) if the OAR_JOB_KEY_FILE env variable is set.
- [oarstat] fix accounting display
- [oar_resources_init] fix HyperThreading bug + improve CLI
- [oar_resources_add] make HyperThreading optional + fix long options + make
nicer warning outputs for auto-offset
- [admission rules] rewrite the job type check rule
- [admission rules] fix oaradmissionrules bug with MySQL when modifying a rule
- [oar-node] fix pid in init script.
- [api] some optimizations + rework authentication configuration (apache).
- [api][drawgantt-svg][monika] fix apache config (apache 2.4).
- [drawgantt-svg] new version with aggreation of resources and more.
- [monika] add thread to the hidden properties.
- [api] fastcgi config now using suexec
- [api] now using apache environment variables when headers are not available
- [api] optimization of /jobs query response time (especially efficient for
mysql based installations)
- [api] security fix: HTML outputs which did not break on errors
version 2.5.4:
--------------
- [api] Implemented GET /resources/<property>/jobs to get jobs running on
resources, grouped by a given property.
- [api] Implemented HTTP_X_API_PATH_PREFIX header variable to prefix all
returned URIs.
- [api] Added GET /jobs/<id>/details support.
- [api] Implemented the ability to get a set of jobs at once with
GET /jobs?ids=<id1>:<id2>:<id3>:...
- [api] BUGFIX: stderr and stdout where reversed.
- [api] BUGFIX: memory leak in the API when used with FastCGI.
- [api] Rewritten/commented apache config file.
- [kamelot] BUGFIX: fix hierarchies manipulation (remove toplevel resource).
- [accounting] Fixed a memory leak and a rare case of bad consumption count.
- [oar.conf] Replace the MAX_CONCURRENT_JOB_TERMINATIONS option by
MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING
- [almighty] Rewrote the handling of starting and finishing jobs: limit
bipbip processes to MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING
to avoid overloading the server.
- [oarexec] Introduced BASH_ENV=~oar/.batch_job_bashrc for batch jobs
Batch jobs with bash shell have some difficulties to source the
right bash scripts when launching.
Now we set BASH_ENV=~oar/.batch_job_bashrc before launching the
user bash process so we can handle which script must be sourced.
By default we source ~/.bashrc.
- [commands] Exit immediately on wrong arguments.
- [oarsh] Propagate OAR shell environment variables:
The users have access to the same OAR environment variables when
connecting on all the job nodes with oarsh
- [job_uid] Removed job uid feature (not used).
- [job_resource_manager] Use fstrim (for SSD) when cleaning files.
- [deploy] Do not check the nodes when ACTIVATE_PINGCHECKER_AT_JOB_END is on
and the job is of the deploy type (bug #17128).
- [judas] Disabled sending log by email on errors as this could generate too
many mails.
- [noop] Added the 'noop' job type. If specified, nothing is done on computing
nodes. The job just reserves the resources for the specified
walltime.
- [quotas] Added the possibility to make quotas on:
- the number of used resources
- the number of running jobs
- the result of resources X hours of running jobs
- [runner] Added runner bipbip processes in the bipbip_laucher in Almighty.
- [database] Replaced field "maintenance" by "drain".
The administrator can disable resources without killing
current jobs by::
oarnodesetting -h n12 -p drain=YES
or::
oarnodesetting --drain -h n12
:WARNING: any admission rule using the "maintenance" keyword
must be adapted to use the "drain" keyword.
- [oar_resources_init] Added support for SMT (hyperthreading)
- [cpuset] The cpuset resources filed is now a varchar.
It is now possible to specify several cpu id in the cpuset field
as needed in some case where SMT is enabled on nodes, e.g.::
1+4+8
- [oarsub] Added a filter for notifications
It now is possible to specify which TAGs must trigger motifications::
oarsub --notify "[END,ERROR]mail:name@domain.com" -I
- [admission rules] Added priority to rules that allows to manage more easily
the rules execution order.
- [admission rules] Added a enable/disable flag to rules to allow activating
or deactivating rules without having to comment the code.
- [oaradmin] The oaradmin rules command is now disabled since it does not
handle priority and enable flags.
- [oaradmin] The oaradmin conf command is disabled.
- [oar_resources_add] Added the oar_resources_add command to help adding
resources and replace the oaradmin resources command.
- [oaradmissionrules] oaradminssionrules is a new command to manage the
oaradmission rules.
- [oarnodesetting] Removed dependnency to oarnodes.
- [drawgantt-svg] Various bugfixes and improvements
- [metasched] If a besteffort job has a checkpoint duration defined
(oarsub --checkpoint) then OAR tries to checkpoint it before killing it.
It is possible to define a limit of the checkpoint duration with an
admission rule ($checkpoint variable).
- [drawgantt] Drawgantt is not now deprecated (and not shipped with packages)
- [misc] OAR packaged components do not require Ruby anymore.
- [oaraccounting] Fix bug reported in Debian tracker #678976
- [sources] Clean-up some used or unrelevant files/codes
- [scheduler] change default schedulers to quota
The default scheduler of the queues default, admin and besteffort is
now oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas.
The configuration file /etc/oar/scheduler_quotas.conf contains no quota
enforcement so the behaviour remains the same as before.
version 2.5.3:
--------------
- Add the "Name" field on the main Monika page. This is easier for the users
to find there jobs.
- Add MAX_CONCURRENT_JOB_TERMINATIONS into the oar.conf ofthe master. This
limits the number of concurrent processes launched by the Almighty when the
the jobs finish.
- Bug fix in ssh key feature in oarsub.
- Added --compact, -c option to oarstat (compact view or array jobs).
- Improvements of the API: media upload from html forms, listing of files,
security fixes, add of new configuration options, listing of the scheduled
nodes into jobs, fixed bad reinitialization of the limit parameter,
stress_factor, accounting...
See OAR-DOCUMENTATION-API-USER for more informations.
- CGROUP: handle cgroup hierarchy already mounted by the OS like in Fedora 18
(by systemd in /sys/fs/cgroup) in job_resource_manager_cgroups.pl.
- Bug fix oar-database: fix the reset function for mysql.
- SVG version of drawgantt: all features are now implemented to replace the
legacy drawgantt. Both can be installed.
- Bug fix schedulers: rewrite schedulers with placeholders.
- Rework default admission rules.
- Add support to the oar_resource_init command to generate resources with
a "thread" property (useful if HyperThreading is activated/used on nodes).
- Fix stdout/stderr bug: check the allowed characters in the path given by
the users.
- Fix: the user shell (bash) didn't source /etc/bash.bashrc in batch jobs.
- Add quota which limits the number of used resources at a time depending of
the job attributes: queue, project, types, user
(available with the scheduler
"oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas").
- Add comments in user job STDERR files to know if a job was killed or
checkpointed.
- Add the variable $jobproperties_applied_after_validation. It can be used in
an admission rule to add a constraint after the validation of the job. Ex:
$jobproperties_applied_after_validation = "maintenance='off'";
So, even if all the ressources have "maintenance='on'", the new jobs will be
accepted but not scheduled now.
- Add the oardel option --force-terminate-finishing-job: to use when a job is
stuck in the Finishing state.
- Bug #15911: Energy saving now waits SCHEDULER_NODE_MANAGER_IDLE_TIME for
nodes that have been woken up, even if they didn't run any job.
- Simplify job dependencies: do not check the exit code of the jobs in
dependencies.
- Admission rules: add the "estimate_job_nb_resources" function that is
useful to know the number of resources that will be used by a job.
- oarstat: add another output format that can be used by using "--format 2"
or by setting "OARSTAT_DEFAULT_OUTPUT_FORMAT=2" in oar.conf.
- oarsub: Add the capability to use the tag %jobname% in the STDOUT (-O)
and/or STDERR (-E) filenames (like %jobid%).
- bug #14935: fix timesharing jobs within a container issue
- add schedulers with the placeholder feature.
version 2.5.2:
--------------
- Bugfix: /var/lib/oar/.bash_oar was empty due to an error in the common
setup script.
- Bugfix: the PINGCHECKER_COMMAND in oar.conf depends now on %%OARDIR%%.
- Bug #13939: the job_resource_manager.pl and job_resource_manager_cgroups.pl
now deletes the user files in /tmp, /var/tmp and /dev/shm at
the end of the jobs.
- Bugfix: in oardodo.c, the preprocessed variables was not defined correclty.
- Finaud: fix race condition when there was a PINGCHECKER error jsut before
another problem. The node became Alive again when the PINGCHECKER said OK
BUT there was another error to resolve.
- Bugfix: The feature CHECK_NODES_WITH_RUNNING_JOB=yes never worked before.
- Speedup monika (X5).
- Monika: Add the conf max_cores_per_line to have several lines if the number
of cores are too big.
- Minor changes into API:
- added cmd_output into POST /jobs.
- API: Added GET /select_all?query=<query> (read only mode).
- Add the field "array_index" into the jobs table. So that resubmit a job
from an array will have the right array_index anvironment variable.
- oarstat: order the output by job_id.
- Speedup oarnodes.
- Fix a spelling error in the oaradmin manpage.
- Bugfix #14122 : the oar-node init.d script wasn't executing
start_oar_node/stop_oar_node during the 'restart' action.
- Allow the dash character into the --notify "exec:..." oarsub option.
- Remove some old stuffs from the tarball:
- visualization_interfaces/{tgoar,accounting,poar};
- scheduler/moldable;
- pbs-oar-lib.
- Fix some licence issues.
version 2.5.1:
--------------
- Sources directories reorganized
- New "Phoenix" tool to try to reboot automatically broken nodes
(to setup into /etc/oar/oar_phoenix.pl)
- New (experimental!) scheduler written in Ocaml
- Cpusets are activated by default
- Bugfix #11065: oar_resource_init fix (add a space)
- Bug 10999: memory leak into Hulot when used with postgresql. The leak has
been minimized, but it is still there (DBD::Pg bug)
- Almighty cleans ipcs used by oar on exit
- Bugfix #10641 and #10999 : Hulot is automatically and periodically restarted
- Feature request #10565: add the possibility to check the aliveness of the
nodes of a job at the end of this one (pingchecker)
- REST API heavily updated: new data structures with paginated results,
desktop computing functions, rspec tests, oaradmin resources management,
admission rules edition, relative/absolutes uris fixed
- New ruby desktop computing agent using REST API (experimental)
- Experimental testsuite
- Poar: web portal using the REST API (experimental)
- Oaradmin YAML export support for resources creation (for the REST API)
- Bugfix #10567: enabling to bypass window mechanism of hulot.
- Bugfix #10568: Wake up timeout changing with the number of nodes
- Add in oar.conf the tag "RUNNER_SLIDING_WINDOW_SIZE": it allows the runner
to use a sliding window to launch the bipbip processes if
"DETACH_JOB_FROM_SERVER=1". This feature avoids the overload of the server
if plenty of jobs have to be launched at the same time.
- Fix problem when deleting a job in the Suspended state (oarexec was stopped
by a SIGSTOP so it was not able to handle the delete operation)
- Make the USER_SIGNAL feature of oardel multi job independant and remove the
temporary file at the end of the job
- Monika: display if the job is of timesharing type or not
add in the job listing the initial_request (is there a reason to
not display it?)
- IoLib: update scheduler_priority resources property for timesharing jobs.
So the scheduler will be able to avoid to launch every timesharing
jobs on the same resources (they can be dispatched)
- OAREXEC: unmask SIGHUP and SIGPIPE for user script
- node_change_state: do not Suspect the first node of a job which was
EXTERMINATED by Leon if the cpuset feature is configured (let do the job by
the cpuset)
- OAREXEC: ESRF detected that sometime oarexec think that he notified the
Almighty with it exit code but nothing was seen on the server. So try to
resend the exit code until oarexec is killed.
- oar_Tools: add in notify_almighty a check on the print and on the close of
the socket connected to Almighty.
- oaraccounting: --sql is now possible into a "oarstat --accounting" query
- Add more logs to the command "oarnodes -e host" when a node turns into
Suspected
- Execute user commands with /proc/self/oom_adj to 15. So the first processes
that will be killed when there is no more memory available is the user
ones.
Hence the system will remain up and running and the user job will finished.
Drawback: this file can be changed manually by the user so if someone knows
a method to do the same thing but only managed by root, we take???
- Bugfix API: quotes where badly escaped into job submission
- Add the possibility to automatically resubmit idempotent job which ends
with an exit code of 99: oarsub -t idempotent "sleep 5; exit 99"
- Bugfix API: Some informations where missing into jobs/details, especially
the scheduled resources.
- API: added support of "param_file" value for array job submissions. This value
is a string representing the content of a parameters file. Sample submission::
{"resource":"/cpu=1", "command":"sleep", "param_file":"60\n90\n30"}
This submits 3 sleep jobs with differents sleep values.
- Remove any reference to gridlibs and gridapi as these components are obselete
- Add stdout and stderr files of each job in oarstat output.
- API now supports fastcgi (big performance raise!)
- Add "-f" option to oarnodesetting to read hostnames from a file.
- API can get/upload files (GET or POST /media/<file_path>)
- Make "X11 forwarding" working even if the user XAUTHORITY environment
variable does not contain ~/.Xauthority (GDM issue).
- Add job_resource_manager_cgroups which handles cpuset + other cgroup
features like network packet tagging, IO disk shares, ...
- Bugfix #13351: now oar_psql_db_init is executed with root privileges
- Bugfix #13434: reservation were not handled correctly with the energy
saving feature
- Add cgroups FREEZER feature to the suspend/resume script (better than kill
SIGSTOP/SIGCONT).
This is doable thanks to the new job_resource_manager_cgroups.
- Implement a new script 'oar-database' to manage the oar database.
oar_mysql_init & oar_psql_init are dropped.
- Huge code reorganisation to allow a better packaging and system integration
- Drop the oarsub/oarstat 2.3 version that was kept for compatiblity issues
during the 2.4.x branch.
- By default the oar scheduler is now
'oar_sched_gantt_with_timesharing_and_fairsharing' and the following values
has been set in oar.conf: SCHEDULER_TIMEOUT to 30, SCHEDULER_NB_PROCESSES to 4
and SCHEDULER_FAIRSHARING_MAX_JOB_PER_USER to 30
- Add a limitation on the number of concurrent bipbip processes on the server
(for detached jobs).
- Add IPC cleaning to the job_resource_manager* when there is no other job of
the same user on the nodes.
- make better scheduling behaviour for dependency jobs
- API: added missing stop_time into /jobs/details
version 2.4.4:
--------------
- oar_resource_init: bad awk delimiter. There's a space and if the property
is the first one then there is not a ','.
- job suspend: oardo does not exist anymore (long long time ago). Replace it
with oardodo.
- oarsub: when an admission rule died micheline returns an integer and not an
array ref. Now oarsub ends nicely.
- Monika: add a link on each jobid on the node display area.
- sshd_config: with nodes with a lot of core, 10 // connections could be too
few
version 2.4.3:
--------------
- Hulot module now has customizable keepalive feature
- Added a hook to launch a healing command when nodes are suspected
(activate the SUSPECTED_HEALING_EXEC_FILE variable)
- Bugfix #9995: oaraccouting script doesn't freeze anymore when db is unreachable.
- Bugfix #9990: prevent from inserting jobs with invalid username (like an empty username)
- Oarnodecheck improvements: node is not checked if a job is already running
- New oaradmin option: --auto-offset
- Feature request #10565: add the possibility to check the aliveness of the
nodes of a job at the end of this one (pingchecker)
version 2.4.2:
--------------
- New "Hulot" module for intelligent and configurable energy saving
- Bug #9906: fix bad optimization in the gantt lib (so bad scheduling
version 2.4.1:
--------------
- Bug #9038: Security flaw in oarsub --notify option
- Bug #9601: Cosystem jobs are no more killed when a resource is set to Absent
- Fixed some packaging bugs
- API bug fixes in job submission parsing
- Added standby info into `oarnodes -s` and available_upto info into
/resources uri of the API
- Bug Grid'5000 #2687 Fix possible crashes of the scheduler.
- Bug fix: with MySQL DB Finaud suspected resources which are not of the
"default" type.
- Signed debian packages (install oar-keyring package)
version 2.4.0:
--------------
- Bug #8791: added CHECK_NODES_WITH_RUNNING_JOB=no to prevent from checking
occupied nodes
- Fix bug in oarnodesetting command generated by oar_resources_init (detect_resources)
- Added a --state option to oarstat to only get the status of specified jobs
(optimized query, to allow scripting)
- Added a REST API for OAR and OARGRID
- Added JSON support into oarnodes, oarstat and oarsub
- New Makefile adapted to build packages as non-root user
- add the command "oar_resources_init" to easily detect and initialize the
whole resources of a cluster.
- "oaradmin version" : now retrieve the most recent database schema number
- Fix rights on the "schema" table in postgresql.
- Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
- Ctrl-C was not working anymore in oarsub.
It seems that the signal handler does not handle the previous syntax
($SIG = 'qdel')
- Fix bug in oarsh with the "-l" option
- Bug #7487: bad initialisation of the gnatt for the container jobs.
- Scheduler: move the "delete_unnecessary_subtrees" directly into
"find_first_hole". Thus this is possible to query a job like::
oarsub -I -l nodes=1/core=1+nodes=4/core=2
(no hard separation between each group)
For the same behaviour as before, you can query:
oarsub -I -l {prop=1}/nodes=1/core=1+{prop=2}/nodes=4/core=2
- Bug #7634: test if the resource property value is effectively defined
otherwise print a ''
- Optional script to take into account cpu/core topology of the nodes at boot
time (to activate inside oarnodesetting_ssh)
- Bug #7174: Cleaned default PATH from "./" into oardodo
- Bug #7674: remove the computation of the scheduler_priority field for
besteffort jobs from the asynchronous OAR part. Now the value is set when
the jobs are turned into toLaunch state and in Error/Terminated.
- Bug #7691: add --array and --array-param-file options parsing into the
submitted script. Fix also some parsing errors.
- Bug #7962: enable resource property "cm_availability" to be manipulated by
the oarnodesetting command
- Added the (standby) information to a node state in oarnodes when it's state
is Absent and cm_availability != 0
- Changed the name of cm_availability to available_upto which is more relevant
- add a --maintenance option to oarnodesetting that sets the state of a resource
to Absent and its available_upto to 0 if maintenance is on and resets previous
values if maintenance is off.
- added a --signal option to oardel that allow a user to send a signal to one of
his jobs
- added a name field in the schema table that will refer to the OAR version name
- added a table containing scheduler name, script and description
- Bug #8559: Almighty: Moved OAREXEC_XXXX management code out of the queue for
immediate action, to prevent potential problems in case of scheduler timeouts.
- oarnodes, oarstat and the REST API are no more making retry connections to the
database in case of failure, but exit with an error instead. The retry behavior
is left for daemons.
- improved packaging (try to install files in more standard places)
- improved init script for Almighty (into deb and rpm packages)
- fixed performance issue on oarstat (array_id index missing)
- fixed performance issue (job_id index missing in event_log table)
- fixed a performance issue at job submission (optimized a query and added an
index on challenges table)
decisions).
version 2.3.5:
--------------
- Bug #8139: Drawgantt nil error (Add condition to test the presence of nil
value in resources table.)
- Bug #8416: When a the automatic halt/wakeup feature is enabled then there
was a problem to determine idle nodes.
- Debug a mis-initialization of the Gantt with running jobs in the
metascheduler (concurrency access to PG database)
version 2.3.4:
--------------
- add the command "oar_resources_init" to easily detect and initialize the
whole resources of a cluster.
- "oaradmin version" : now retrieve the most recent database schema number
- Fix rights on the "schema" table in postgresql.
- Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
- Ctrl-C was not working anymore in oarsub.
It seems that the signal handler does not handle the previous syntax
($SIG = 'qdel')
- Bug #7487: bad initialisation of the gnatt for the container jobs.
- Fix bug in oarsh with the "-l" option
- Bug #7634: test if the resource property value is effectively defined
otherwise print a ''
- Bug #7674: remove the computation of the scheduler_priority field for
besteffort jobs from the asynchronous OAR part. Now the value is set when
the jobs are turned into toLaunch state and in Error/Terminated.
- Bug #7691: add --array and --array-param-file options parsing into the
submitted script. Fix also some parsing errors.
- Bug #7962: enable resource property "cm_availability" to be manipulated by
the oarnodesetting command
version 2.3.3:
--------------
- Fix default admission rules: case unsensitive check for properties used in
oarsub
- Add new oaradmin subcommand : oaradmin conf. Useful to edit conf files and
keep changes in a Subversion repository.
- Kill correctly each taktuk command children in case of a timeout.
- New feature: array jobs (option --array) (on oarsub, oarstat oardel,
oarhold and oarresume) and file-based parametric array jobs
(oarsub --array-param-file)
/!\ in this version the DB scheme has changed. If you want to upgrade your
installation from a previous 2.3 release then you have to execute in your
database one of these SQL script (stop OAR before)::
mysql:
DB/mysql_structure_upgrade_2.3.1-2.3.3.sql
postgres:
DB/pg_structure_upgrade_2.3.1-2.3.3.sql
version 2.3.2:
--------------
- Change scheduler timeout implementation to schedule the maximum of jobs.
- Bug #5879: do not show initial_request in oarstat when it is not a job of
the user who launched the oarstat command (oar or root).
- Add a --event option to oarnodes and oarstat to display events recorded for
a job or node
- Display reserved resources for a validated waiting reservation, with a hint
in their state
- Fix oarproperty: property names are lowercase
- Fix OAR_JOB_PROPERTIES_FILE: do not display system properties
- Add a new user command: oarprint which allow to pretty print resource
properties of a job
- Debug temporary job UID feature
- Add 'kill -9' on subprocesses that reached a timeout (avoid Perl to
wait something)
- desktop computing feature is now available again. (ex: oarsub -t
desktop_computing date)
- Add versioning feature for admission rules with Subversion
version 2.3.1:
--------------
- Add new oarmonitor command. This will permit to monitor OAR jobs on compute
nodes.
- Remove sudo dependency and replace it by the commands "oardo" and
"oardodo".
- Add possibility to create a temporary user for each jobs on compute nodes.
So you can perform very strong restrictions for each job (ex: bandwidth
restrictions with iptable, memory management, ... everything that can be
handled with a user id)
- Debian packaging: Run OAR specific sshd with root privileges (under heavy
load, kernel may be more responsive for root processes...)
- Remove ALLOWED_NETWORKS tag in oar.conf (added more complexeity than
resolving problems)
- /!\ change database scheme for the field *exit_code* in the table *jobs*.
Now *oarstat* *exit_code* line reflects the right exit code of the user
passive job (before, even when the user script was not launched the
*exit_code* was 0 which was BAD)
- /!\ add DB field *initial_request* in the table *jobs* that stores the
oarsub line of the user
- Feature Request #4868: Add a parameter to specify what the "nodes" resource
is a synomym for. Network_address must be seen as an internal data and not
used.
- Scheduler: add timeout for each job == 1/4 of the remaining scheduler
timeout.
- Bug #4866: now the whole node is Suspected instead of just the par where
there is no job onto. So it is possible to have a job on Suspected nodes.
- Add job walltime (in seconds) in parameter of prologue and epilogue on
compute nodes.
- oarnodes does not show system properties anymore.
- New feature: container job type now allows to submit inner jobs for a
scheduling within the container job
- Monika refactoring and now in the oar packaging.
- Added a table schema in the db with the field version, reprensenting the
version of the db schema.
- Added a field DB_PORT in the oar config file.
- Bug #5518: add right initialization of the job user name.
- Add new oaradmin command. This will permit to create resources and
manage admission rules more easily.
- Bug #5692: change source code into a right Perl 5.10 syntax.
version 2.2.12:
---------------
- Bug #5239: fix the bug if there are spaces into job name or project
- Fix the bug in Iolib if DEAD_SWITCH_TIME >0
- Fix a bug in bipbip when calling the cpuset_manager to clean jobs in error
- Bug #5469: fix the bug with reservations and Dead resources
- Bug #5535: checks for reservations made at a same time was wrong.
- New feature: local checks on nodes can be plugged in the oarnodecheck
mechanism. Results can be asynchronously checked from the server (taktuk
ping checker)
- Add 2 new tables to keep track of the scheduling decisions
(gantt_jobs_predictions_log and gantt_jobs_resources_log). This will help
debugging scheduling troubles (see SCHEDULER_LOG_DECISIONS in oar.conf)
- Now reservations are scheduled only once (at submission time). Resources
allocated to a reservations are definitively set once the validated is
done and won't change in next scheduler's pass.
- Fix DrawGantt to not display besteffort jobs in the future which is
meaningless.
version 2.2.11:
---------------
- Fix Debian package dependency on a CGI web server.
- Fix little bug: remove notification (scheduled start time) for Interactive
reservation.
- Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for
reservations to check.
- Fix bug: add a lock around the section which creates and feed the OAR
cpuset.
- Taktuk command line API has changed (we need taktuk >= 3.6).
- Fix extra ' in the name of output files when using a job name.
- Bug #4740: open the file in oarsub with user privileges (-S option)
- Bug #4787: check if the remote socket is defined (problem of timing with
nmap)
- Feature Request #4874: check system names when renaming properties
- DrawGantt can export charts to be reused to build a global multi-OAR view
(e.g. DrawGridGantt).
- Bug #4990: DrawGantt now uses the database localtime as its time reference.
version 2.2.10:
---------------
- Job dependencies: if the required jobs do not have an exit code == 0 and in
the state Terminated then the schedulers refuse to schedule this job.
- Add the possibility to disable the halt command on nodes with
cm_availability value.
- Enhance oarsub "-S" option (more #OAR parsed).
- Add the possibility to use oarsh without configuring the CPUSETs (can be
useful for users that don't want to configure there ssh keys)
version 2.2.9:
--------------
- Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
- Bug fix in Finishing sequence (Suspect right nodes).
version 2.2.8:
--------------
- Bug 4159: remove unneeded Dump print from oarstat.
- Bug 4158: replace XML::Simple module by XML::Dumper one.
- Bug fix for reservation (recalculate the right walltime).
- Print job dependencies in oarstat.
version 2.2.7:
--------------
version 2.2.11:
---------------
- Fix Debian package dependency on a CGI web server.
- Fix little bug: remove notification (scheduled start time) for Interactive
reservation.
- Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for
reservations to check.
- Fix bug: add a lock around the section which creates and feed the OAR
cpuset.
- Taktuk command line API has changed (we need taktuk >= 3.6).
- Fix extra ' in the name of output files when using a job name.
- Bug #4740: open the file in oarsub with user privileges (-S option)
- Bug #4787: check if the remote socket is defined (problem of timing with
nmap)
- Feature Request #4874: check system names when renaming properties
- DrawGantt can export charts to be reused to build a global multi-OAR view
(e.g. DrawGridGantt).
- Bug #4990: DrawGantt now uses the database localtime as its time reference.
version 2.2.10:
---------------
- Job dependencies: if the required jobs do not have an exit code == 0 and in
the state Terminated then the schedulers refuse to schedule this job.
- Add the possibility to disable the halt command on nodes with
cm_availability value.
- Enhance oarsub "-S" option (more #OAR parsed).
- Add the possibility to use oarsh without configuring the CPUSETs (can be
useful for users that don't want to configure there ssh keys)
version 2.2.9:
--------------
- Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
- Bug fix in Finishing sequence (Suspect right nodes).
version 2.2.8:
--------------
- Bug 4159: remove unneeded Dump print from oarstat.
- Bug 4158: replace XML::Simple module by XML::Dumper one.
- Bug fix for reservation (recalculate the right walltime).
- Print job dependencies in oarstat.
version 2.2.7:
--------------
- Bug 4106: fix oarsh and oarcp issue with some options (erroneous leading
space).
- Bug 4125: remove exit_code data when it is not relevant.
- Fix potential bug when changing asynchronously the state of the jobs into
"Terminated" or "Error".
version 2.2.6:
--------------
- Bug fix: job types was not sent to cpuset manager script anymore.
(border effect from bug 4069 resolution)
version 2.2.5:
--------------
- Bug fix: remove user command when oar execute the epilogue script on the
nodes.
- Clean debug and mail messages format.
- Remove bad oarsub syntax from oarsub doc.
- Debug xauth path.
- bug 3995: set project correctly when resubmitting a job
- debug 'bash -c' on Fedora
- bug 4069: reservations with CPUSET_ERROR (remove bad hosts and continue
with a right integrity in the database)
- bug 4044: fix free resources query for reservation (get the nearest hole
from the beginning of the reservation)
- bug 4013: now Dead, Suspected and Absent resources have different colors in
drawgantt with a popup on them.
version 2.2.4:
--------------
- Redirect third party commands into oar.log (easier to debug).
- Add user info into drawgantt interface.
- Some bug fixes.
version 2.2.3:
--------------
- Debug prologue and epilogue when oarexec receives a signal.
version 2.2.2:
--------------
- Switch nice value of the user processes into 0 in oarsh_shell (in case of
sshd was launched with a different priority).
- debug taktuk zombies in pingchecker and oar_Tools
version 2.2.1:
--------------
- install the "allow_clasic_ssh" feature by default
- debug DB installer
version 2.2:
------------
- oar_server_proepilogue.pl: can be used for server prologue and epilogue to
authorize users to access to nodes that are completely allocated by OAR. If
the whole node is assigned then it kills all jobs from the user if all cpus
are assigned.
- the same thing can be done with cpuset_manager_PAM.pl as the script used to
configure the cpuset. More efficent if cpusets are configured.
- debug cm_availability feature to switch on and off nodes automatically
depending on waiting jobs.
- reservations now take care of cm_availability field
version 2.1.0:
--------------
- add "oarcp" command to help the users to copy files using oarsh.
- add sudo configuration to deal with bash. Now oarsub and oarsh have the
same behaviour as ssh (the bash configuration files are loaded correctly)
- bug fix in drawgantt (loose jobs after submission of a moldable one)
- add SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE into oar.conf. Thus admin can
add some resources for each jobs (like frontale node)
- add possibility to use taktuk to check the aliveness of the nodes
- %jobid% is now replaced in stdout and stderr file names by the effective
job id
- change interface to shu down or wake up nodes automatically (now the node
list is read on STDIN)
- add OARSUB_FORCE_JOB_KEY in oar.conf. It says to create a job ssh key by
default for each job.
- %jobid% is now replaced in the ssh job key name (oarsub -k ...).
- add NODE_FILE_DB_FIELD_DISTINCT_VALUES in oar.conf that enables the admin
to configure the generated containt of the OAR_NODE_FILE
- change ssh job key oarsub options behaviour
- add options "--reinitialize" and "--delete-before" to the oaraccounting
command
- cpuset are now stored in /dev/cpuset/oar
- debian packaging: configure and launch a specific sshd for the user oar
- use a file descriptor to send the node list --> able to handle a very large
amount of nodes
- every config files are now in /etc/oar/
- oardel can add a besteffort type to jobs and vis versa
version 2.0.2:
--------------
- add warnings and exit code to oarnodesetting when there is a bad node name
or resource number
- change package version
- change default behaviour for the cpuset_manager.pl (more portable)
- enable a user to use the same ssh key for several jobs (at his own risk!)
- add node hostnames in oarstat -f
- add --accounting and -u options in oarstat
- bug fix on index fields in the database (syncro): bug 2020
- bug fix about server pro/epilogue: bug 2022
- change the default output of oarstat. Now it is usable: bug 1875
- remove keys in authorized_keys of oar (on the nodes) that do not
correspond to an active cpuset (clean after a reboot)
- reread oar.conf after each database connection tries
- add support for X11 forwarding in oarsub -I and -C
- debug mysql initialization script in debian package
- add a variable in oarsh for the default options of ssh to use
(more useful to change if the ssh version installed does not
handle one of these options)
- read oar.conf in oarsh (so admin can more easily change options in this
script)
- add support for X11 forwarding via oarsh
- change variable for oarsh: OARSH_JOB_ID --> OAR_JOB_ID
version 2.0.0:
--------------
- Now, with the ability to declare any type of resources like licences,
VLAN, IP range, computing resources must have the type *default* and a
network_address not null.
- Possibility to declare associated resources like licences, IP ranges, ...
and to reserve them like others.
- Now you can connect to your jobs (not only for reservations).
- Add "cosystem" job type (execute and do nothing for these jobs).
- New scheduler : "oar_sched_gantt_with_timesharing". You can specify jobs
with the type "timesharing" that indicates that this scheduler can launch
more than 1 job on a resource at a time. It is possible to restrict this
feature with words "user and name". For example, '-t
timesharing=user,name' indicates that only a job from the same user with
the same name can be launched in the same time than it.
- Add PostGresSQL support. So there is a choice to make between MySQL and
PostgresSQL.
- New approach for the scheduling : administrators have to insert into the
databases descriptions about resources and not nodes. Resources have a
network address (physical node) and properties. For example, if you have
dual-processor, then you can create 2 different resources with the same
natwork address but with 2 different processor names.
- The scheduler can now handle resource properties in a hierarchical
manner. Thus, for example, you can do "oarsub -l /switch=1/cpu=5" which
submit a job on 5 processors on the same switch.
- Add a signal handler in oarexec and propagate this signal to the user
process.
- Support '#OAR -p ...' options in user script.
- Add in oar.conf:
* DB_BASE_PASSWD_RO : for security issues, it is possible to execute
request with parts specified by users with a read only account (like
"-p" option).
* OARSUB_DEFAULT_RESOURCES : when nothing is specified with the oarsub
command then OAR takes this default resource description.
* OAREXEC_DEBUG_MODE : turn on or off debug mode in oarexec (create
/tmp/oar/oar.log on nodes).
* FINAUD_FREQUENCY : indicates the frequency when OAR launchs Finaud
(search dead nodes).
* SCHEDULER_TIMEOUT : indicates to the scheduler the amount of time
after what it must end itself.
* SCHEDULER_JOB_SECURITY_TIME : time between each job.
* DEAD_SWITCH_TIME : after this time Absent and Suspected resources are
turned on the Dead state.
* PROLOGUE_EPILOGUE_TIMEOUT : the possibility to specify a different
timeout for prologue and epilogue (PROLOGUE_EPILOGUE_TIMEOUT).
* PROLOGUE_EXEC_FILE : you can specify the path of the prologue script
executed on nodes.
* EPILOGUE_EXEC_FILE : you can specify the path of the epilogue script
executed on nodes.
* GENERIC_COMMAND : a specific script may be used instead of ping to
check aliveness of nodes. The script must return bad nodes on STDERR
(1 line for a bad node and it must have exactly the same name that