Skip to content

Commit 1b55f2d

Browse files
authored
Merge pull request #2700 from nick-stroud/ofe_new_features_alt_history
OFE: new features - edited history
2 parents 4a709fa + 9862c32 commit 1b55f2d

File tree

13 files changed

+198
-103
lines changed

13 files changed

+198
-103
lines changed

community/front-end/ofe/deploy.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ PRJ_API['bigqueryconnection.googleapis.com']='BigQuery Connection API'
5757
PRJ_API['sqladmin.googleapis.com']='Cloud SQL Admin API'
5858
PRJ_API['servicenetworking.googleapis.com']='Service Networking API'
5959
PRJ_API['secretmanager.googleapis.com']='Secret Manager API'
60+
PRJ_API['serviceusage.googleapis.com']='Service Usage API'
61+
PRJ_API['storage.googleapis.com']='Cloud Storage API'
6062

6163
# Location for output credential file = pwd/credential.json
6264
#

community/front-end/ofe/infrastructure_files/gcs_bucket/webserver/startup.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ autostart=true
249249
autorestart=true
250250
user=gcluster
251251
redirect_stderr=true
252+
environment=HOME=/opt/gcluster
252253
stdout_logfile=/opt/gcluster/run/supvisor.log" >/etc/supervisord.d/gcluster.ini
253254

254255
printf "Creating systemd service..."

community/front-end/ofe/infrastructure_files/vpc_tf/GCP/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,15 @@ limitations under the License.
1919
|------|---------|
2020
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 0.12.31 |
2121
| <a name="requirement_google"></a> [google](#requirement\_google) | >= 3.54 |
22+
| <a name="requirement_google-beta"></a> [google-beta](#requirement\_google-beta) | >= 3.83 |
2223
| <a name="requirement_random"></a> [random](#requirement\_random) | >= 3.0 |
2324

2425
## Providers
2526

2627
| Name | Version |
2728
|------|---------|
2829
| <a name="provider_google"></a> [google](#provider\_google) | >= 3.54 |
30+
| <a name="provider_google-beta"></a> [google-beta](#provider\_google-beta) | >= 3.83 |
2931
| <a name="provider_random"></a> [random](#provider\_random) | >= 3.0 |
3032

3133
## Modules
@@ -36,11 +38,14 @@ No modules.
3638

3739
| Name | Type |
3840
|------|------|
41+
| [google-beta_google_compute_global_address.private_ip_alloc](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_compute_global_address) | resource |
3942
| [google_compute_firewall.firewall_allow_ssh](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_firewall) | resource |
4043
| [google_compute_firewall.firewall_internal](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_firewall) | resource |
4144
| [google_compute_network.network](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_network) | resource |
4245
| [google_compute_router.network_router](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_router) | resource |
4346
| [google_compute_router_nat.network_nat](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_router_nat) | resource |
47+
| [google_service_networking_connection.private_vpc_connection](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/service_networking_connection) | resource |
48+
| [random_id.resource_name_suffix](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/id) | resource |
4449
| [random_pet.vpc_name](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource |
4550

4651
## Inputs

community/front-end/ofe/infrastructure_files/vpc_tf/GCP/main.tf

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,34 @@ resource "google_compute_firewall" "firewall_internal" {
8585
allow { protocol = "icmp" }
8686
}
8787

88+
locals {
89+
# This label allows for billing report tracking based on module.
90+
labels = {
91+
created_by = "ofe"
92+
}
93+
}
94+
95+
resource "random_id" "resource_name_suffix" {
96+
byte_length = 4
97+
}
98+
99+
resource "google_compute_global_address" "private_ip_alloc" {
100+
provider = google-beta
101+
project = var.project
102+
name = "global-psconnect-ip-${random_id.resource_name_suffix.hex}"
103+
purpose = "VPC_PEERING"
104+
address_type = "INTERNAL"
105+
network = google_compute_network.network.self_link
106+
prefix_length = 16
107+
labels = local.labels
108+
}
109+
110+
resource "google_service_networking_connection" "private_vpc_connection" {
111+
network = google_compute_network.network.self_link
112+
service = "servicenetworking.googleapis.com"
113+
reserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name]
114+
}
115+
88116
output "vpc_id" {
89117
value = google_compute_network.network.name
90118
description = "Name of the created VPC"

community/front-end/ofe/infrastructure_files/vpc_tf/GCP/versions.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ terraform {
2020
source = "hashicorp/google"
2121
version = ">= 3.54"
2222
}
23+
google-beta = {
24+
source = "hashicorp/google-beta"
25+
version = ">= 3.83"
26+
}
2327
random = {
2428
source = "hashicorp/random"
2529
version = ">= 3.0"

community/front-end/ofe/website/ghpcfe/cluster_manager/cloud_info.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
"n1": defaultdict(lambda: "x86_64"),
4040
"c3": defaultdict(lambda: "sapphirerapids"),
4141
"c3d": defaultdict(lambda: "zen2"),
42+
"c4": defaultdict(lambda: "emeraldrapids"),
4243
# Compute Optimized
4344
"c2": defaultdict(lambda: "cascadelake"),
4445
"c2d": defaultdict(
@@ -359,6 +360,7 @@ def get_cpu_price(num_cores, instance_type, skus):
359360
"n2d": "N2D AMD Instance Core",
360361
"h3": "Compute optimized Core",
361362
"c3": "Compute optimized Core",
363+
"c4": "Compute optimized Core",
362364
"c2": "Compute optimized Core",
363365
"c2d": "C2D AMD Instance Core",
364366
"c3d": "C3D AMD Instance Core",
@@ -411,6 +413,7 @@ def get_mem_price(num_gb, instance_type, skus):
411413
"h3": "Compute optimized Ram",
412414
"c2d": "C2D AMD Instance Ram",
413415
"c3d": "C3D AMD Instance Ram",
416+
"c4": "C4 Instance RAM",
414417
"t2d": "T2D AMD Instance Ram",
415418
"a2": "A2 Instance Ram",
416419
"m1": "Memory-optimized Instance Ram",

community/front-end/ofe/website/ghpcfe/forms.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,8 +247,10 @@ class Meta:
247247
"image",
248248
"dynamic_node_count",
249249
"static_node_count",
250+
"reservation_name",
250251
"enable_placement",
251252
"enable_hyperthreads",
253+
"enable_tier1_networking",
252254
"enable_node_reuse",
253255
"GPU_type",
254256
"GPU_per_node",
@@ -303,6 +305,9 @@ def prep_dynamic_select(field, value):
303305
self.instance.GPU_type
304306
)
305307

308+
# Mark 'reservation_name' as optional
309+
self.fields["reservation_name"].widget.attrs.update({"placeholder": "Optional"})
310+
306311
def clean(self):
307312
cleaned_data = super().clean()
308313
if cleaned_data["enable_placement"] and cleaned_data[

community/front-end/ofe/website/ghpcfe/models.py

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -739,9 +739,9 @@ class Cluster(CloudResource):
739739
default="pd-standard",
740740
)
741741
controller_disk_size = models.PositiveIntegerField(
742-
validators=[MinValueValidator(10)],
742+
validators=[MinValueValidator(120)],
743743
help_text="Boot disk size (in GB)",
744-
default=50,
744+
default=120,
745745
blank=True,
746746
)
747747
num_login_nodes = models.PositiveIntegerField(
@@ -762,9 +762,9 @@ class Cluster(CloudResource):
762762
login_node_disk_size = models.PositiveIntegerField(
763763
# login node disk must be large enough to hold the SlurmGCP
764764
# image: >=50GB
765-
validators=[MinValueValidator(50)],
765+
validators=[MinValueValidator(120)],
766766
help_text="Boot disk size (in GB)",
767-
default=50,
767+
default=120,
768768
blank=True,
769769
)
770770
grafana_dashboard_url = models.CharField(
@@ -919,6 +919,13 @@ class ClusterPartition(models.Model):
919919
enable_hyperthreads = models.BooleanField(
920920
default=False, help_text="Enable Hyperthreads (SMT)"
921921
)
922+
enable_tier1_networking = models.BooleanField(
923+
default=False,
924+
help_text=(
925+
"Select Tier 1 networking (currently only valid for N2, N2D, C2,"
926+
"C2D, C3, C3d, M3 and Z3 VMs that have at least 30 vCPUs.)"
927+
),
928+
)
922929
enable_node_reuse = models.BooleanField(
923930
default=True,
924931
help_text=(
@@ -937,7 +944,7 @@ class ClusterPartition(models.Model):
937944
default="pd-standard",
938945
)
939946
boot_disk_size = models.PositiveIntegerField(
940-
validators=[MinValueValidator(49)],
947+
validators=[MinValueValidator(50)],
941948
help_text="Boot disk size (in GB)",
942949
default=50,
943950
blank=True,
@@ -972,7 +979,11 @@ class ClusterPartition(models.Model):
972979
"Automatically delete additional disk when node is deleted?"
973980
),
974981
)
975-
982+
reservation_name = models.CharField(
983+
blank=True,
984+
max_length=30,
985+
help_text="Name of the reservation to use for VM resources"
986+
)
976987
def __str__(self):
977988
return self.name
978989

@@ -1572,9 +1583,9 @@ class Workbench(CloudResource):
15721583
help_text="Type of storage to be required for notebook boot disk",
15731584
)
15741585
boot_disk_capacity = models.PositiveIntegerField(
1575-
validators=[MinValueValidator(100)],
1586+
validators=[MinValueValidator(120)],
15761587
help_text="Capacity (in GB) of the filesystem (min of 1024)",
1577-
default=100,
1588+
default=120,
15781589
)
15791590
proxy_uri = models.CharField(max_length=150, blank=True, null=True)
15801591
trusted_user = models.ForeignKey(
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
- source: community/modules/database/slurm-cloudsql-federation
22
kind: terraform
33
id: slurm-sql
4-
use: [hpc_network, ps-connect]
4+
use: [hpc_network]
55
settings:
66
sql_instance_name: sql-{{ cluster_id }}
77
tier: "db-g1-small"

community/front-end/ofe/website/ghpcfe/templates/blueprint/cluster_config.yaml.j2

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,6 @@ deployment_groups:
2323
subnetwork_name: {{ cluster.subnet.cloud_id }}
2424
id: hpc_network
2525

26-
- source: community/modules/network/private-service-access
27-
id: ps-connect
28-
use: [ hpc_network ]
29-
3026
{{ filesystems_yaml | safe }}
3127

3228
- source: community/modules/project/service-account

community/front-end/ofe/website/ghpcfe/templates/blueprint/partition_config.yaml.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,12 @@
1414
id: {{ part_id }}-group
1515
use:
1616
settings:
17+
bandwidth_tier: {% if part.enable_tier1_networking %}tier_1_enabled{% else %}platform_default{% endif %}
1718
enable_smt: {{ part.enable_hyperthreads }}
1819
machine_type: {{ part.machine_type }}
20+
{% if part.reservation_name %}
21+
reservation_name: {{ part.reservation_name }}
22+
{% endif %}
1923
node_count_dynamic_max: {{ part.dynamic_node_count }}
2024
node_count_static: {{ part.static_node_count }}
2125
disk_size_gb: {{ part.boot_disk_size }}

0 commit comments

Comments
 (0)