forked from ComputeCanada/magic_castle
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
748 lines (603 loc) · 37.4 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [11.10.0] UNRELEASED
### Changed
- Updated Terraform minimum version to 1.2.1 from 1.1.0
## [11.9.3] 2022-05-09
### Changed
- [puppet] Bumped puppet-jupyterhub to v4.3.1 to enable code-server and openrefine launchers in JupyterLab.
## [11.9.2] 2022-05-04
### Changed
- Set AWS Terraform provider version to v4.10.0 (issue #212)
## [11.9.1] 2022-05-02
### Changed
- Replaced direct call to keystone.py and key2fp.py by shell wrapped script call.
This fixes an issue with Terraform Cloud or system that do not provide `python3`.
The wrapper script test the existence for different name for python cli and choose
start the script with the first one available.
- Made key2fp.py compatible with Python 2 to ensure compatibility with TF Cloud.
## [11.9.0] 2022-04-22
### Added
- Added support for OS_CLIENT_CONFIG_FILE in keystone.py
- Added support for ed25519 hostkeys and SSH keys (issue #210)
- Add acme_key_pem optional variable to DNS modules (issue #205)
- [puppet] sssd can now be configured with additional LDAP domains (issue [#179](https://github.com/ComputeCanada/puppet-magic_castle/issues/179))
- [puppet] Added class to support cephfs mount
### Changed
- Replaced python by python3 in external data source (issue #208)
- [puppet] SELinux mode is now configurable through hieradata.
- [puppet] SELinux default mode is now `permissive` instead of `enforcing`.
- [puppet] Moved munge_socket selinux::module in slurm::base (issue [#178](https://github.com/ComputeCanada/puppet-magic_castle/issues/178))
- [puppet] Set MemSpecLimit in slurm.conf and allow configuration using hieradata (issue [#181](https://github.com/ComputeCanada/puppet-magic_castle/issues/181))
- [puppet] Hardened SSHD config for CentOS 8 (issue [#182](https://github.com/ComputeCanada/puppet-magic_castle/issues/182))
- [puppet] Fixed issue with cat of file inside interactive Slurm jobs (issue [#183](https://github.com/ComputeCanada/puppet-magic_castle/issues/183))
- [puppet] NFS devices can now be an empty string (issue [#185](https://github.com/ComputeCanada/puppet-magic_castle/issues/185))
- [puppet] Fixed notebook and terminals link in slurmformspawner hieradata (issue [#189](https://github.com/ComputeCanada/puppet-magic_castle/issues/189))
- [puppet] Replaced workshop class by an account parameter that adds folders to /etc/skel based on archives fetched from URL provided in hieradata of profile::account (issue [#191](https://github.com/ComputeCanada/puppet-magic_castle/issues/191))
- [puppet] Replaced Apache reverse proxy by Caddy (issue [#195](https://github.com/ComputeCanada/puppet-magic_castle/issues/195))
- [puppet] Dropped support for Globus v4 and replaced most of profile::globus code by `treydock/globus` module to add support for Globus v5.4.
- [puppet] Local users are now required to be created for cloud-init suoders to be removed
### Removed
- [DNS] Removed the `dtn` entry for the Globus endpoint.
- [puppet] Removed class `profile::workshop`
## [11.8] 2022-02-16
### Added
- Added `image` optional instance attribute (PR #203)
- [Azure] Added the optional `plan` variable to allow usage of images that require plan information (issue #201)
- [puppet] Added missing rpm for nvidia gpu passthrough setup in CentOS 8 (issue [#159](https://github.com/ComputeCanada/puppet-magic_castle/issues/159))
- [puppet] Added confinement of users in Slurm's jobs when using Slurm >= 21.08 (PR [#164](https://github.com/ComputeCanada/puppet-magic_castle/issues/164))
- [puppet] Added configuration of NetworkManager DNS config for ipa-server (issue [#169](https://github.com/ComputeCanada/puppet-magic_castle/issues/169))
- [puppet] Added ability to create local users with hieradata (PR [#174](https://github.com/ComputeCanada/puppet-magic_castle/issues/174))
- [puppet] Added ability to create LDAP users with hieradata (PR [#175](https://github.com/ComputeCanada/puppet-magic_castle/issues/175))
- [puppet] Added ssh public keys to ipa_create_user.py
### Changed
- Updated Terraform minimum version to 1.1.0 from 0.14.2
- Updated image in all examples to use either Rocky or AlmaLinux 8
- [AWS] Ensure AWS spot instances can be persistent (PR #193)
- [Azure] Use spot specific values only when using spot instances (Issue #199)
- [puppet] Updated Compute Canada software stack to version 2020 for CentOS 7 and 8
- [puppet] Fixed regex for EESSI repositories (issue [#161](https://github.com/ComputeCanada/puppet-magic_castle/issues/161))
- [puppet] Fixed compatibility with Slurm 21.08 (issue [#162](https://github.com/ComputeCanada/puppet-magic_castle/issues/162))
- [puppet] Fixed sss cache invalidation (issue [#165](https://github.com/ComputeCanada/puppet-magic_castle/issues/165))
- [puppet] Put lmod_default_modules in computecanada.yaml (PR [#172](https://github.com/ComputeCanada/puppet-magic_castle/issues/172))
- [puppet] Changed default slurm version to 21.08
- [puppet] Refactored ipa_create_user.py arguments (PR [#173](https://github.com/ComputeCanada/puppet-magic_castle/issues/173))
- [puppet] Updated puppet-jupyterhub version to v4.1.0 (PR [#176](https://github.com/ComputeCanada/puppet-magic_castle/issues/176))
### Removed
- [puppet] Remove dhclient removal in centos 8 (PR [#170](https://github.com/ComputeCanada/puppet-magic_castle/issues/170))
## [11.7] 2021-11-02
### Changed
- [OpenStack] Replaced `var.os_int_subnet` by `var.subnet_id` as subnet can have the same name across networks.
- [puppet] Update puppet module `consul_template` to 2.3.3.
### Removed
- [OpenStack] Removed `var.os_int_network`.
## [11.6] 2021-10-12
### Changed
- [puppet] Replaced resolv.conf configuration with `file_line` by config of NetworkManager.
- [puppet] Updated puppet-mysql version to fix encoding issue.
- [puppet] Defined the seltype of ipa-rewrite.conf.
## [11.5] 2021-10-06
### Changed
- [puppet] Fixed Slurm email sending (issue #144)
- [puppet] Fixed freeipa server idstart value (PR #146)
- [puppet] Deactivated expiration date of admin password (issue #147, PR #149)
- [puppet] Fixed location of scratch symlink in mkhome.sh (issue #148, PR #150)
## [11.4] 2021-09-22
### Added
- Added Elastic Fabric Adapter support for AWS
- Generalized NFS volumes exportation to support more than home, project and scratch
- [puppet] Added epel as requirement for gpu passthrough packages
- [puppet] Added Slurm 21.08 to the list of versions that can be installed
### Changed
- Fixed puppet launch event in `cluster_config`
- Updated `cloudflare_record` usage to support CloudFlare 3.0 Terraform module.
- Users have to use a valid project name when submitting jobs.
- `centos` can no longer submitted jobs.
- [puppet] Disabled resource limit propagation in Slurm
- [puppet] Configured ipa admin account passwd to never expire
## [11.3] 2021-07-22
### Added
- Added documentation on how to use Terraform Cloud
- Added a trigger when uploading SSL certificates
- Added documentation on how to regenerate SSL certificates
- Added support for Rocky Linux and AlmaLinux
- [puppet] Added X11 forwarding enabling in Slurm config
### Changed
- Improved various sections of the documentation
- Updated puppet-agent to 6.16.0
- Updated puppet-server to 6.23.0
- [puppet] Updated puppet-jupyterhub to v3.8.8
- [cloud-init] Replaced deprecated ssh config algorithm selection parameter
- [puppet] Replaced network-scripts by NetworkManager
- [puppet] Fixed fail2ban config when not using CentOS
- [puppet] Fixed powertools enabling when not using CentOS
### Removed
- [puppet] Removed dhclient package when os major release is 8
## [11.2] 2021-06-02
### Added
- Added a Consul event that can trigger the reload of Puppet agent configuration
- Added the triggering of the consul puppet event when hieradata or terraform_data.yaml
is changed after a `terraform apply`.
- [puppet] Added the user `sudoer_username` and its authorized_keys to profile::base.
### Changed
- SSH public keys can now be added to `public_keys` after the cluster is built,
without rebuilding all instances.
- [cloud-init] Changes to cloud-init file are ignored once the instances are created
- [cloud-init] Moved installation of puppet-agent rpm before puppet-server
- [cloud-init] Moved bootstrapping of consul server in `puppet-magic_castle`'s bootstrap.sh
- [puppet] Bumped puppet-jupyterhub to v3.8.6
- [puppet] Fixed workshop class for guest accounts
### Removed
- [puppet] Globus Connect Server V5 class
## [11.1] 2021-05-18
### Added
- Support for spot instances in AWS, Azure and Google Cloud
- GitHub actions to lint documentation
- Advanced example demonstrating how to create an ELK cluster
- Advanced example demonstrating how to create a kubernetes cluster
- Advanced example demonstrating how to create an Apache Spark cluster
- [cloud-init] Check for the presence of `bootstrap.sh` in the puppet environment
### Changed
- Fixed issue with Google Cloud DNS referring to CloudFlare (PR #167 @consideratio)
- Fixed documentation (PR #169 @consideratio)
- Fixed var.volumes validation - allowing the variable to be an empty map
- Fixed terraform_facts.yaml format for cloud provider and region
- Reintroduced nouveau driver blacklisting in cloud-init
- Improved provider specific READMEs
- Reverted flavors and image in openstack example to Arbutus instead of Beluga Cloud
- [AWS] Updated AMI value in the example
- [puppet] Fixed init of /etc/cvmfs/default.local with consul-template
## [11.0] 2021-04-19
Refer to the [migration guide](docs/migration.md) to know how to deal with
the changes introduced in this release.
### Added
- Add `volumes` variable structure to define volumes that will be attached to instances with matching tags.
- Reference design documentation
- Added straightforward password replacement procedur (PR #152 @plstonge)
- Added advice on user management when not creating users (PR #153 @ocaisa)
- Added freeipa admin username and password to the output (PR #154 @ocaisa)
- Added a basic puppet cluster example
- Added a basic lustre cluster setup example
### Changed
- Changed `instances` variable structure to add tags
- Updated developer documentation
- Updated main documentation
- Merged puppet agent and puppet server cloud-init YAML file in a single file
- Combined some of the DNS module inputs to in a single variable
- Defined a default value for `nb_users` (0)
- Changed format of `terraform_data.yaml` to no longer depends on puppet-magic_castle
- Moved resources related to networking in `network.tf` for each cloud provider
- Normalized most resource names across providers
- Changed `os_floating_ips` input type from list to map
- [puppet] Updated EESSI CVMFS config version
- [puppet] Changed how devices for NFS server are identified using facts instead of glob
- [puppet] Bumped puppet-jupyterhub version to v3.8.2
- [puppet] Fixed gpu profile for pci passthrough providers
- [puppet] Replaced site.pp roles based on hostnames by roles based on tags
### Removed
- Removed fetching of `terraform_data.yaml.tmpl` from `config_git_url` repo
- Removed `storage` variable, replaced by `volumes`.
- Removed Azure variable `managed_disk_type`
- Removed common variable `root_disk_size`
## [10.2] 2021-03-10
### Added
- Added GitHub actions workflows for validation and release
- [puppet] Added GitHub actions workflows for validation
### Changed
- [puppet] Fixed the logic in mkproject.sh to avoid empty GID
### Removed
- Removed Travis CI yaml
- [puppet] Removed Travis CI yaml
## [10.1] 2021-03-08
### Added
- [puppet] Added log rotation rules for Slurm daemons
- [puppet] Added configuration of Prometheus retention time and storage
### Changed
- Updated documentation
- [puppet] Changed /localscratch SELinux label from `default_t` to `tmp_t`
- [puppet] Bumped puppet-jupyterhub version to 3.7.3
- [puppet] Refactored mkproject daemon to allow removal of users from Slurm account when
removing them from the associated POSIX group in FreeIPA.
### Removed
- Removed possibility of having underscores in `cluster_name`
## [10.0] 2021-02-24
### Added
- Added `config_git_url` and `config_version` to all examples
- Added a DNS module that produce a text file that can be imported in DNS zone
for providers that are currently not supported by MC.
- [puppet] Added cloud provider and region to puppet facts
- [puppet] Added failmode option to Duo
- [puppet] Added fallback to nvidia-smi for GPU driver version
### Changed
- Bumped Terraform minimum version to 0.14.5
- Fixed Acme provider name (issue #)
- Renamed `puppetenv_git` to `config_git_url`
- Renamed `puppetenv_rev` to `config_version`
- Made `config_git_url` and `config_version` mandatory.
- Improved documentation
- [puppet] Reverted "Remove export of security labels in nfs"
- [puppet] Fixed label issues with home, project and scratch root folder ([puppet-magic_castle issue #96](https://github.com/ComputeCanada/puppet-magic_castle/issues/96))
- [puppet] Generalized VGPU driver installation to support other clouds
- [puppet] Changed default Slurm version from 19.05 to 20.11
- [puppet] Fix guest account username zero-padding when nb_accounts < 10
## [9.3] 2021-01-26
### Added
- Added configuration of NFS server running versions with nfs.conf
### Changed
- Excluded slurm from EPEL yum repo
## [9.2] 2021-01-13
### Added
- Added support for multiple software stacks including Compute Canada and EESSI (PR #124)
- [azure] Added support for configuration of image using image_id
- [puppet] Added automatic Slurm node weight computation through consul2slurm
- [puppet] Added a wrapper for ipa-client-install to make nsupdate failure fatal ([puppet-magic_castle issue #79](https://github.com/ComputeCanada/puppet-magic_castle/issues/79))
- [puppet] Added definition of bind_addr parameter in consul config ([puppet-magic_castle issue #83](https://github.com/ComputeCanada/puppet-magic_castle/issues/83))
### Changed
- Fixed `cluster_name` validation and documentation
- Simplified version locking in release.sh
- [puppet] Changed `'ensure'` of package globus-repo to `'latest'`
- [puppet] Changed z-00-rsnt_arch consul-template to produce a single value
- [puppet] Stopped installation of CUDA when there is no nvidia device
- [puppet] Fixed PowerTools repo filename for CentOS Linux 8
- [puppet] Bumped puppet-jupyterhub version to 3.7.2
### Removed
- Removed version from the provider config block
- [azure] Remove version set from azure provider
## [9.1] 2020-11-19
### Added
- Added Terraform variable verification for `cluster_name` and `guest_passwd`
- Added `mokey` subdomain to the DNS record generator list
- Added documentation on creating account with Mokey
- [puppet] Integrated UBCCR's project [Mokey](https://github.com/ubccr/mokey) to allow users to create and manage their account on the cluster using a web portal
- [puppet] Added two daemons `mkhome` and `mkproject` that watch slapd log output for new users and groups and create
home, scratch and project directory plus Slurm account automatically
- [puppet] Added Magic Castle custom login template to JupyterHub, which include a new `Create account` link when user signup is allowed
- [puppet] Added classes `profile::accounts` and `profile::accounts::guests` to handle account creation
### Changed
- [puppet] Updated puppet-jupyterhub version to 3.7.1
- [puppet] Updated `kinit_wrapper` logic to avoid issue with multiple process using the same Kerberos cached credentials
### Removed
- [puppet] Remove class `profile::freeipa::guest_accounts` replaced by `profile::accounts::guests`
## [9.0] 2020-11-17
### Changed
- Upgrade to Terraform 0.13, with a new minimum requirement of 0.13.4
## [8.5] 2020-11-16
### Added
- [puppet] Added Duo MFA classes and on/off switches for management, login and compute nodes.
### Changed
- [puppet] Refactored logic identifying the presence of an NVIDIA GPU to consider memory instead of instance name.
- [puppet] Updated the version of most modules installed with Puppetfile
- [puppet] Fixed permissions of nvidia-modprobe when installing Arbutus' `nvidia-vgpu-tools` rpm.
## [8.4] 2020-11-10
### Changed
- [puppet] Updated puppet-jupyterhub version to v3.5.1
## [8.3] 2020-10-21
### Changed
- Fixed puppetenv_rev default value when creating Magic Castle release (Commit bf30e13036)
- [puppet] Bump puppet-jupyterhub version to v3.4.2
- [puppet] Fixed freeipa issue when an ip was already recorded in the DNS and a new instance was joining the realm with the same ip ([puppet-magic_castle issue #69](https://github.com/ComputeCanada/puppet-magic_castle/issues/69))
## [8.2] 2020-10-14
### Added
- [puppet] Added Cloudflare load balancer cvmfs_acl_regex (issue #64)
- [puppet] Added SELinux policy to allow fail2ban to ban using route (issue #65)
### Changed
- Fixed AWS, Azure, GCP and OVH examples that were incorrectly referring to openstack module
- [cloud-init] Bumped puppetserver to 6.13.0 and puppetagent to 6.18.0
- [puppet] Replaced homemade template of squid.conf by usage of puppet-squid module
- [puppet] Bumped puppet-jupyterhub to v3.4.1
- [puppet] Fixed slurmctld dependency to cluster registration in slurmdbd
- [puppet] Fixed ipa_create_user password configuration
## [8.1] 2020-07-29
### Added
- Added ability to generate an ssh keypair to upload files with Terraform file provisioner.
- [puppet] Added options in hieradata to configure CVMFS repos
### Changed
- Activated VPC DNS support in AWS (issue #108)
- Fixed documentation in multiple sections (PR #92, #93, #97, #98, #101, #106)
- Fixed DNS section of AWS example (PR #108)
- [puppet] Replaced homemade template of squid.conf by usage of puppet-squid module
## [8.0] 2020-06-19
Following release of CentOS 8 2004, AWS now provides an official CentOS 8 image that
has been tested and is functional with Magic Castle 8.0.
### Added
- Added the login node ids as output of the main Magic Castle Terraform module.
- Added a trigger to DNS module deploy_certs based on login node ids. If there is a modification to one of the login node state,
the certificates will be uploaded to the corresponding login node, without having to taint the `deploy_certs` resource manually
(PR #88).
- Added try function around access to index 0 of resource array to limit errors when destroying resources.
- [puppet] Added a resource in `profile::base` to remove terraform `local-exec` leftover empty scripts in /tmp.
### Changed
- [puppet] Id of the accounts created in FreeIPA now start at UID_MAX defined `/etc/login.defs`. (commonly 60000 instead of 50000)
- [puppet] fail2ban configuration is now done with puppet-fail2ban module. The `sshd` jail is now named `ssh-route`.
- [cloud-init] Bumped puppetserver to 6.12.0 and puppetagent to 6.16.0.
- Puppet hieradata yaml files are now uploaded with Terraform file provisioner instead of being embedded in mgmt1 userdata.
This means a change to the number of users, the guest password, or the hieradata variable no longer trigger a rebuild of mgmt1
but only a reupload of YAML files (PR #89)
- [docs] Various fixes (Issues #87, #92, #93)
### Removed
- Hieradata has been removed from puppetmaster.yaml template.
## [7.3] 2020-06-04
This release introduces three main features:
- Add support for Slurm 20
- Add support for CentOS 8. Tested functional on GCP and OpenStack. AWS and Azure do not provide
an official CentOS 8 image with cloud-init support at the moment of this release.
- Add support for Compute Canada Arbutus Cloud NVIDIA VGPUs (flavor `vgpu-...`).
### Changed
- Improved main documentation.
- [AWS] Most resources if not all now have the name of the cluster as a prefix in their name
- [OpenStack] Simplified volume attachment count computation
- [puppet] Slurm plugin spank-cc-tmpfs_mounts is now installed from copr yumrepo
- [puppet] Fixed order of slurm packages install
- [puppet] Exec resource in charge of creating the slurm cluster in slurmdbd now returns 0 if the cluster already exists
- [puppet] `consul-template` class initialization is now entirely in hieradata file `common.yaml`.
- [puppet] CentOS 8 support: replaced notification of `nfs-idmap.service` by notification of `nfs-server.service`.
- [puppet] CentOS 8 support: replaced `pdsh` by `clustershell`
- [puppet] CentOS 8 support: rpc_nfs_args is now only defined if os is CentOS 7.
- [puppet] CentOS 8 support: `ipa_create_user.py` now use `/usr/libexec/platform-python` instead of `/usr/bin/env python`.
- [puppet] CentOS 8 support: Replaced Python 2 `unicode` calls in `ipa_create_user.py` by six's `text_type`
- [puppet] CentOS 8 support: Moved list of nvidia package names from class profile::gpu to hieradata. List now depends on CentOS version.
- [puppet] CentOS 8 support: Moved FreeIPA `regen_cert_cmd` value to hieradata. Command now depends on CentOS version.
- [puppet] Bumped puppet-jupyterhub version to 3.3.2
- [puppet] Update nvidia driver fact to make sure at most one version is in the output
- [puppet] Changed logic of `nvidia_grid_vgpu` fact to just check if the instance flavor includes `vgpu` in its name
- [puppet] CentOS 8 support: Moved default loaded CVMFS modules to hieradata. Module list now depends on CentOS version
- [puppet] CentOS 8 support: Fixed nfs clean rbind execstop warning
- [puppet] Replaced tcp_con_validator to check if slurmdbd is running by a wait_for resource on slurmdbd.log regex
- [puppet] CentOS 8 support: Fixed package name in nvidia-driver-version fact.
- [cloud-init] Replaced `reboot -n` in `runcmd` by `power_state` with reboot now. This makes sure final stage of cloud-init is applied before reboot.
- [gcp] CentOS 8 support: rewrote `install_cloudinit.sh` to avoid network issue at boot and install cloud-init only for the time needed. (issue #85)
### Added
- [puppet] Added support for CentOS 8 when selecting Slurm yumrepo
- [puppet] Slurm 20 support: Added `slurm_version` variable to hieradata. It can be either 19 or 20.
- [puppet] Slurm 20 support: Added PlugStackConfig parameter to slurm.conf
- [puppet] Added slurm-perlapi package to `profile::base::slurm`
- [puppet] Added exec to initialize cvmfs default.local with consul-template.
- [puppet] Added a default node1 in slurm.conf when no slurmd has been registered yet in consul
- [puppet] Added a require on Epel yumrepo for package fail2ban-server
- [puppet] Added class profile::fail2ban::install
- [puppet] CentOS 8 support: Added dependency on puppet-epel to install epel yumrepo
- [puppet] CentOS 8 support: Enabled powertools repo
- [puppet] CentOS 8 support: Enabled idm:DL1 stream
- [puppet] CentOS 8 support: Added network-scripts package when os is CentOS 8
- [puppet] CentOS 8 support: Added munge_socket selinux policy to allow confined user to submit jobs
- [puppet] Added class `profile::gpu::install`
- [puppet] Added a requirement on epel yumrepo for singularity package.
- [puppet] Added a requirement for slurm exec `create_account` on slurm exec `add_cluster`
- [puppet] CentOS 8 support: added class `profile::mail::server`
- [puppet] Added a requirement on yumrepo epel to class `jupyterhub` in `profile::jupyterhub::hub`
- [puppet] Added support for Compute Canada Arbutus Cloud VGPUs
### Removed
- [puppet] Removed notify to `slurmctld` from slurm::accounting exec `add_cluster`.
## [7.2] 2020-05-20
### Changed
- Reverted type of image variable from string to any because Azure image input is a map.
## [7.1] 2020-05-20
### Changed
- [GCP] Fixed a typo in disk paths that prevented creation of project and scratch volume
- [GCP] Increased the root disk size in the example to 20GB. This is the new minimum for centos7 image.
- Bumped minimum requirements to 0.12.21 in all versions.tf files.
- [puppet] Bumped most module versions to latest in Puppetfile
- [puppet] Bumped consul and consul-template version to latest available
### Added
- Documentation on variables specific to the commercial cloud providers
- Documentation on hieradata
- Documentation on firewall_rules
- Description and types to all terraform variables
## [7.0] 2020-05-18
### Changed
- Established a distinction in variables between puppetmaster and mgmt1 - allowing puppetmaster role to be assigned to another instance.
- Bumped minimum requirement of terraform to 0.12.21 (issue #77)
- Numerous doc fixes
- Added a section on related projects in README.md
- [Azure] Updated Azure infrastructure.tf to use Azure provider 2.0.0 (issue #62)
- [cloud-init] Set puppet-agent and puppet-server version to 6.13 and 6.9
- [cloud-init] Renamed cloud-init YAML files to `puppetagent.yaml` and `puppetmaster.yaml`
- [OpenStack] Fixed volume size computation regression introduced in commit c09ea17d
- [puppet] Defined selinux context for /scratch as home_root_dir
- [puppet] Defined selinux context for /project as home_root_dir
- [puppet] Improved cuda facts to avoid issue when html index is incomplete
- [puppet] Updated package names in gpu module and facts
- [puppet] Generalized gpu module cuda repo link composition
- [puppet] Replaced package by ensure_packages for kernel-devel in gpu
- [puppet] Updated version of puppet-jupyterhub to v3.3.0
- [puppet] Improved FreeIPA client installation waiting conditions to limit failure
- [puppet] Disabled root jobs in slurm.conf]
- [puppet] Added nosuid to client nfs mount options
- [puppet] Activated root_squash for all nfs exports
- [puppet] Changed URL for the source of `cc-tmpfs_mount.so`
- [puppet] Updated derdanne/nfs version in Puppetfile
- [puppet] Made profile::base a requirement of profile::nfs::server
- [puppet] Defined servername param for apache in reverse_proxy
### Added
- [Azure] Added variable to allow usage of an existing resource group based on its name (issue #72)
- [cloud-init] Enabled puppet agent postrun command in cloud-init
- [puppet] mgmt1 volumes formatting is now handled by `profile::nfs::server` class
- [puppet] Added logic to define, mount and format nfs shared volumes with lvm
- [puppet] Added README.md
- [puppet] Fixed regression introduced in 630a04
- [puppet] Added possibility to manage jail activation and ignore_ip with hierada
- [puppet] Added profile classes for JupyterHub: `profile::jupyterhub::node` and `profile::jupyterhub::hub`
- [puppet] Added variable to allow definition of lmod default modules with hieradata
- [puppet] Configured lmod default modules to start with gcc and openmpi
- [puppet] Added ability to receive last puppet run output by email through puppet postrun script
- [puppet] Added support for NVIDIA GRID vGPU
- [puppet] Added class `profile::base::azure` for logic specific to Azure
### Removed
- [cloud-init] Removed volumes formatting, partitioning and mounting from mgmt cloud-init
- [puppet] Removed condition on gpu count in nvidia_driver_vers
- [puppet] Removed mkhomedir from FreeIPA client installation parameters
## [6.4] - 2020-03-11
### Changed
- [cloud-init] Hardcoded the version of puppet-agent (6.13.0) and puppetserver (6.9.1).
## [6.3] - 2020-03-09
### Added
- Added random_uuid to generate a random consul token
- [travis] Added init and validation of dns/gcloud module
- [cloud-init] Added bootstrap installation of consul-server in cloud-init
- [puppet] Added slurmd restart when node is missing from sinfo
- [puppet] Introduced class `profile::workshop::mgmt`. The class allow to unzip an archive in all guest accounts
- [puppet] Added profile::workshop::mgmt to mgmt in site.pp
- [puppet] Defined consul::service for slurmd, slurmctld slurmdbd, rsyslog, cvmfs client, and squid. This in conjunction
with consul-template, allow these services to be removed from the config files when the instance that was running the
service is halted. For example, if a compute node is shutdown or remove, it will no longer appear in `sinfo` output.
### Changed
- [cloud-init] Turned off puppet agent reporting in cloud-init
- [cloud-init] [puppet] Renamed user_hieradata as user_data
- [cloud-init] Volume formatting and mounting is now conditional on the hostname being `mgmt1`
- [OpenStack] Fixed port_node resource name template
- [puppet] Updated puppet-jupyterhub version to v1.8.1
- [puppet] Consul and consul-template version are now defined in hieradata
- [puppet] Changed node_exported consul service name to node-exported to remove warning
### Removed
- [puppet] Removed unused key from terraform_data
- [puppet] Removed stage in mgmt site.pp
## [6.2] - 2020-03-01
### Added
- Added an error message in cloud-init dev avail while loops
- Added gcloud dns module to AWS, Azure and OVH examples.
- [puppet] Added a slurmd restart when node hostname is missing from sinfo output.
- [puppet] Added class profile::workshop::mgmt to deploy files to guest user homes
- [puppet] Added class profile::workshop::mgmt to mgmt1 in site.pp
### Changed
- [OpenStack] All resources, including instances, have now a name that starts with the cluster name.
This does not affect the instances' hostname
- [puppet] Update puppet-jupyterhub version to v1.7
## [6.1] - 2020-02-27
Fix travis release procedure. 6.0 release bundles contained the wrong module source in main.tf.
## [6.0] - 2020-02-26
Terraform >= 0.12.21 is now required. Usage of the function `subtract` requires at least 0.12.21.
### Added
- Added the optional key `prefix` to the `instance["node"]` map (issue #29)
- [cloud-init] Added removal of ifcfg file with no corresponding nic (issue #61)
- [puppet] Added optional prefix to node regex in site.pp
### Changed
- `instance["node"]` is now a list. This allows the spawning of compute nodes with various instance types (issue #29)
- release.sh is now the only script for creating a release on any platform.
- [Azure] Renamed azurerm_virtual_machine nodevm to node (issue #55)
- [AWS] Replaced aws volume device name by volume id (issue #60)
- [gcp] Renamed gcp var.project_name to var.project (issue #53)
- [puppet] Upgraded puppet-prometheus to 8.2.1
- [puppet] Remove Name=gpu from gres.conf template ([puppet-magic_castle issue #27](https://github.com/ComputeCanada/puppet-magic_castle/issues/27))
## [5.8] - 2020-02-24
### Added
- [Azure] Added `root_disk_size=30` in the example (issue #43)
- [Azure] Added ssh_keys to instances as it is mandatory (issue #44)
- [cloud-init] Added volume attachment verification loops in mgmt cloud-init (issue #54)
- [GCP] Added gcloud dns module to gcp example (issue #37)
- [GCP] Added prefix to the name of volumes and ipv4 (issue #49)
- [OpenStack] Added os_int_subnet variable. The variable is used to force to use a specific subnet with Openstack.
### Changed
- Changes to image variable are now ignored after cluster is built.
- Fixed release scripts to solve bug where multiple `version` variable were present (issue #38)
- [Azure] Updated the example to use the most recent OpenLogic CentOS 7 image (issue #42)
- [Azure] Resources names are now prefixed with the cluster name
- [Azure] Azure public_ip now outputs a list of all login ip addresses
- [cloud-init] Replaced timezone in cloud-init to UTC (issue #51)
- [GCP] Zone variable is now optional. The zone is randomly selected in the zones available for the region if left blank.
- [GCP] Instances internal DNS are now configured to use zonal DNS. The internal DNS hostname is not used, but the change reduced
the DHCP time lease from 24 to 1 hour. This helps when debugging DHCP issue
- [puppet] CC CVMFS repo is now configured from latest RPM repo (issue [puppet-magic_castle issue #19](https://github.com/ComputeCanada/puppet-magic_castle/pull/19))
- [puppet] Increased the squid maximum_object_size ([puppet-magic_castle issue #20](https://github.com/ComputeCanada/puppet-magic_castle/issues/20))
- [puppet] Updated globus rpm repo name
- [puppet] Updated fail2ban config to make it work with 0.10.x ([puppet-magic_castle issue #25](https://github.com/ComputeCanada/puppet-magic_castle/issues/25))
- [puppet] Replaced file by exec to create singularity symlink ([puppet-magic_castle issue #24](https://github.com/ComputeCanada/puppet-magic_castle/issues/24))
- [puppet] Replaced timezone in cloud-init to UTC ([puppet-magic_castle issue #26](https://github.com/ComputeCanada/puppet-magic_castle/issues/26))
- [puppet] Added service network and notify on NetworkManager purge (issue #50)
### Removed
- [GCP] Zone variable is no longer in the example as it is now optional.
## [5.7] - 2020-01-16
### Added
- [AWS] Added `skip_destroy = True` to EBS attachment resources to avoid stalling destroy command.
- [DNS] Added a `dtn` entry for the Globus endpoint.
- [DNS] Added an `ipa` entry that provides access to FreeIPA webpage.
- [puppet] Added a `profile::reverse_proxy` class that configure Apache vhost for JupyterHub, FreeIPA, Globus, etc.
- [puppet] Added service nvidia-persistenced to module gpu.pp.
- [puppet] Added `drain` to states that spawns an scontrol in slurm module.
- [main.tf] Added `hieradata` variable that allow the injection of custom values in puppet hieradata from Terraform.
### Changed
- [AWS] Changed mgmt and login instances from using `associate_public_ip_address` to using an AWS Elastic IP.
- [AWS] Updated example AMI and minimum instance type for mgmt.
- [AWS] Fixed module's syntax for Terraform 0.12.
- [AWS] Made `availability_zone` optional. If zone is not provided, it will be randomly selected amongst the zones available for the selected region.
- [AWS] Changed root disk type from `standard` to `gp2`.
- [AWS] Enabled ebs_optimized for all instances.
- [AWS] Changed SSH keyname from `slurm-cloud-key` to `${cluster_name}-key`
- [cloud-init] Made puppet yumrepo install function of the CentOS major version.
- [cloud-init] Added blacklisting of nouveau driver in kernel cmdline option.
- [DNS] DNS records are now produced by the `record_generator` module instead of listing records in each DNS provider module.
- [puppet] Changed Globus authentication method from MyProxy to OAuth.
- [puppet] Updated puppet-jupyterhub version from v1.1 to v1.6.
- [puppet] Replaced deprecated package name dkms-nvidia by kmod-nvidia-latest-dkms.
- [puppet] Replaced every reference to facts of `eth0` by facts of interface index 0.
- [puppet] Disabled dkms autoinstall timeout in gpu.
### Removed
- [puppet] Removed include of jupyterhub::reverse_proxy in site.pp for login.
## [5.6] - 2019-11-27
### Added
- [DNS] Added support for Google Cloud DNS (PR #24)
- Added a release script compatible with BSD tools - `release.bsd.sh`
### Changed
- [DNS] Changed the login record A pattern from `clustername#.domain` to `login#.clustername.domain` where `#` is the login node index.
- [DNS] Moved wildcard certificate creation from cloudflare to an acme module shared by all dns modules.
- [DNS] Replaced usage of 0-index on array by call to `distinct` (issue #26).
- [DNS] A `jupyter.${cluster_name}.${domain_name}` record is now added for each login instead of just login1.
- Changed the management and login node naming scheme to match node naming. `mgmt01` is now `mgmt1` and `login01` is now `login1`.
- [puppet] Fix puppet-jupyterhub version to v1.1 instead of master branch.
### Removed
- [DNS] Removed creation of SSHFP SHA1 records (issue #22)
## [5.5] - 2019-11-15
### Added
- [docs] Added details on how to use CloudFlare API Token in README.md
- [docs] Added details on which Open RC File to download when using OpenStack
- [DNS] Added an email variable to the dns module.
- [puppet] Added logic to set RSNT_ARCH variable based on the common CPU
instruction extensions available amongst all CVMFS clients using consul.
## [5.4] - 2019-10-23
### Fixed
- [OpenStack] Fix image_id condition for login and node
- [CloudFlare] Update to CloudFlare provider 2.0 and pinned the version in dns.tf.
## [5.3] - 2019-09-30
### Added
- [puppet] Added hbac rule to limit access to mgmt instances from IPA users (issue #13)
- [puppet] Added token based ACL to consul to limit access to the key-value store (issue #11)
### Changed
- [puppet] Activate selinuxuser_tcp_server boolean to allow confined users to run MPI jobs (issue #12)
- [puppet] FreeIPA OTP activation command is now subscribed to server install exec
### Fixed
- [OpenStack] Set the image_id to null when the root disk is a volume to avoid false detection of change by Terraform
## [5.2] - 2019-09-19
### Added
- Travis CI automated testing for Terraform and Puppet files
- Travis CI build status is now at the beginning of README.md
- [docs] Added contribution section to README.md
- [OVH] Added a Terraform file for network related resources
- [OpenStack] Added a Terraform file for network related resources
- [cloud-init] Added removal of firewalld package in mgmt.yaml
- [docs] Added docs on building SELinux enabled image for OVH cloud
- [OpenStack] Added the definition of a root disk block device with condition on flavor choice and root disk size
- [GCP] Added the ability to set the boot disk size with `var.root_disk_size`
- [Azure] Added the ability to set the os storage volume size with `var.root_disk_size`
- [AWS] Added the ability to set the root disk size with `var.root_disk_size`
- [docs] Added documentation on variable `root_disk_size`
- [puppet] Added myproxy and gridftp as service in globus module
### Changed
- Release script now takes only one version number
- Release script now fixes the Terraform providers' version
- [docs] Updated globus documentation
- [docs] Renamed developers docs
- [OVH] OVH infrastructure file is now a symlink to the OpenStack infrastructure file
- [puppet] Globus is now installed only if globus_user/password is defined in the hieradata
- [puppet] Globus endpoint name is now set based on the domain name instead of the instance hostname
- [puppet] FreeIPA ip address parameter on mgmt is now fixed to eth0 interface
- [puppet] Firewall chains and rules that were not set by puppet are now purged
- [puppet] Configure network, netmask and ip to always map to eth0 interface
- [puppet] Configure slurmctld ip address to eth0 to fix issue when there more than one NIC
### Removed
- Removed version number from documentation
### Fixed
- [AWS] Fixed dependency cycle with volumes
- [AWS] Fixed firewall definition
- [AWS] Fixed dependency cycle with mgmt ip addresses
- [OVH] Fixed dependency cycle
- [docs] Fixed typos and rfc links