From 441ff095cfb8a518a37498bdb6cc60f255a3e638 Mon Sep 17 00:00:00 2001 From: Maria Yurlava Date: Wed, 30 Oct 2024 17:30:38 +0100 Subject: [PATCH 1/2] WEB-7257 Change doc name --- .../about-our-ai-infrastructure.md | 133 ++++++++++++++++++ .../ai-infrastructure/about-virtual-vpod.md | 43 ++++++ .../ai-infrastructure/create-an-ai-cluster.md | 56 ++++++++ .../edge-ai/ai-infrastructure/metadata.md | 6 + 4 files changed, 238 insertions(+) create mode 100644 documentation/edge-ai/ai-infrastructure/about-our-ai-infrastructure.md create mode 100644 documentation/edge-ai/ai-infrastructure/about-virtual-vpod.md create mode 100644 documentation/edge-ai/ai-infrastructure/create-an-ai-cluster.md create mode 100644 documentation/edge-ai/ai-infrastructure/metadata.md diff --git a/documentation/edge-ai/ai-infrastructure/about-our-ai-infrastructure.md b/documentation/edge-ai/ai-infrastructure/about-our-ai-infrastructure.md new file mode 100644 index 00000000..84167866 --- /dev/null +++ b/documentation/edge-ai/ai-infrastructure/about-our-ai-infrastructure.md @@ -0,0 +1,133 @@ +--- +title: about-our-ai-infrastructure +displayName: About GPU Cloud +order: 10 +published: true +toc: + --1--AI GPU infrastructure: "ai-gpu-infrastructure" + --1--Tools our AI Infrastructure supports: "tools-supported-by-gcore-gpu-cloud" +pageTitle: About Gcore GPU Cloud | Gcore +pageDescription: Explore Gcore GPU Cloud for AI. NVIDIA servers, top performance, diverse tool support. Easy deployment, per-minute billing. +--- +# GPU Cloud infrastructure + +Gcore GPU Cloud provides high-performance compute clusters designed for machine learning tasks. + +## AI GPU infrastructure + +Train your ML models with the latest NVIDIA GPUs. We offer a wide range of Bare Metal servers and Virtual Machines powered by NVIDIA A100, H100, and L40S GPUs. + +Pick the configuration and reservation plan that best fits your computing requirements. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SpecificationCharacteristicsUse casePerformance
H100 with Infiniband + 8x Nvidia H100 80GB
+ 2 Intel Xeon 8480+
+ 2TB RAM
+ 2x 960GB
+ 8x3.84 TB NVMe
+ 3.2 Tbit/s Infiniband
+ 2x100Gbit/s Ethernet +
+ Optimized for distributed training of Large Language Models. + Ultimate performance for compute-intensive tasks that require a significant exchange of data by the network.
A100 with Infiniband + 8x Nvidia A100 80GB
+ 2 Intel Xeon 8468
+ 2 TB RAM
+ 2x 960GB SSD
+ 8x3.84 TB NVMe
+ 800Gbit/s Infiniband +
+ Distributed training for ML models and a broad range of HPC workloads. + Well-balanced in performance and price.
A100 without Infiniband + 8x Nvidia A100 80GB
+ 2 Intel Xeon 8468
+ 2 TB RAM
+ 2x 960GB SSD
+ 8x3.84 TB NVMe
+ 2x100Gbit/s Ethernet +
+ Training and fine-tuning of models on single nodes.
+
Inference for large models.
+ Multi-user HPC cluster. +
The best solution for inference models that require more than 48GB vRAM.
L40 + 8x Nvidia L40S
+ 2x Intel Xeon 8468
+ 2TB RAM
+ 4x7.68TB NVMe SSD
+ 2x25Gbit/s Ethernet +
+ Model inference.
+
Fine-tuning for small and medium-size models. +
The best solution for inference models that require less than 48GB vRAM.
+ +Explore our competitive pricing on the AI GPU Cloud infrastructure pricing page. + +## Tools supported by Gcore GPU Cloud + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Tool classList of toolsExplanation
FrameworkTensorFlow, Keras, PyTorch, Paddle Paddle, ONNX, Hugging FaceYour model is supposed to use one of these frameworks for correct work.
Data platformsPostgreSQL, Hadoop, Spark, VertikaYou can set up a connection between our cluster and your data platforms of these types to make them work together.
Programming languagesJavaScript, R, Swift, PythonYour model is supposed to be written in one of these languages for correct work.
Resources for receiving and processing dataStorm, Spark, Kafka, PySpark, MS SQL, Oracle, MongoDBYou can set up a connection between our cluster and your resources of these types to make them work together.
Exploration and visualization toolsSeaborn, Matplotlib, TensorBoardYou can connect our cluster to these tools to visualize your model.
diff --git a/documentation/edge-ai/ai-infrastructure/about-virtual-vpod.md b/documentation/edge-ai/ai-infrastructure/about-virtual-vpod.md new file mode 100644 index 00000000..9b0e26b5 --- /dev/null +++ b/documentation/edge-ai/ai-infrastructure/about-virtual-vpod.md @@ -0,0 +1,43 @@ +--- +title: about-virtual-vpod +displayName: Virtual vPOD +order: 30 +published: true +toc: + --1--What is a virtual vPOD?: "what-is-a-virtual-vpod" + --1--Features: "features-of-virtual-vpods" + --1--Benefits: "benefits-of-virtual-vpods" + --1--Configurations: "configurations" +pageTitle: Virtual vPOD | Gcore +pageDescription: Discover virtual vPODs. AI clusters with Poplar servers on virtual machines, offering cost savings, faster deployment, and flexible configurations. +--- +# About Virtual vPOD + +## What is a virtual vPOD? + +A virtual vPOD is a flavor of an AI cluster in which a Poplar server is deployed on a Virtual Machine, as opposed to a dedicated vPOD, which deploys a Poplar server on a dedicated Bare Metal server. + +With virtual vPODs, you can directly access the host machines and can easily set up your own development environment on each IPU instance, install and run any code in an ultrafast connection with IPU accelerators, have better experience with deploying and developing such frameworks like TensorFlow and PyTorch on Cloud IPUs, use ephemeral storage, execute custom code in input pipelines, and integrate Cloud IPUs into research and production workflows. + +## Features of virtual vPODs + +Virtual vPODs offer two main features. + +1. **External volumes**. With virtual vPODs, you can connect external block storage for system and data volumes and easily attach new data volumes.  +2. **Suspension mode**. Virtual vPODs have the Suspension mode, which allows you to avoid any charges when your cluster is stopped. This feature is particularly useful when you have temporary or unpredictable workloads or when you want to make changes to your cluster. When a cluster is suspended, its state is saved on external storage. You can resume the cluster within a few minutes, and it’ll be restored to its previous state. This feature allows for better control over costs and resource optimization. + +## Benefits of virtual vPODs + +1. **Cost savings**. With the suspension mode, users can save money by temporary pausing their resources when they are not in use. +2. **Faster deployment time**. Virtual vPODs are deployed in just 5 minutes, compared to 15 minutes required for physical vPODs. +3. **Greater storage options**. With Virtual vPODs, users can easily attach external data volumes. +4. **Flexibility**. Virtual vPODs can be easily modified or reconfigured as needed to meet changing requirements. + +## Configurations + +Each flavor of virtual vPOD comes with: + +- 1 Virtual Machine. The configuration of a Virtual Machine depends on the capacity of vCPU, RAM and ephemeral storage. +- Host server(s) with 4 IPU-processors on each. The exact number of host servers depends on the flavor you choose. + +For up-to-date prices and availability, refer to our website or your Customer Portal. \ No newline at end of file diff --git a/documentation/edge-ai/ai-infrastructure/create-an-ai-cluster.md b/documentation/edge-ai/ai-infrastructure/create-an-ai-cluster.md new file mode 100644 index 00000000..fe936132 --- /dev/null +++ b/documentation/edge-ai/ai-infrastructure/create-an-ai-cluster.md @@ -0,0 +1,56 @@ +--- +title: create-an-ai-cluster +displayName: Create an AI Cluster +order: 20 +published: true +toc: +pageTitle: Create an AI Cluster | Gcore +pageDescription: Learn how to create an AI cluster using Gcore's Cloug GPU infrastructure. Follow the step-by-step guide to set up your cluster and start using it. +--- +# Create an AI Cluster + +1\. In the Gcore Customer Portal, open the **GPU cloud** page. You'll be taken to the page for AI cluster creation. + +Create an AI Cluster + +2\. Select a region, which is a physical location of the data center. For example, if you choose Manassas, your cluster will be deployed on servers in Manassas. + +3\. Choose the flavor with relevant cluster configuration and allocated resources. The number in vPOD means the number of IPU-processors in your cluster. For instance, one Graphcore server consists of four IPU-processors. + +4\. Select the OS image on which your model will be running. + +Create an AI Cluster image settings + +5\. Configure volumes and set the size of your cluster. Note that you can't change the cluster size after its creation. + +Create an AI Cluster volume settings + +6\. Set up a network interface. You can choose a public or private one: + * **Public**: Attach this interface if you are planning to use the GPU Cloud with servers hosted outside of Gcore Cloud. Your cluster will be accessible from external networks. + + * **Private**: If you want to use the service with Gcore servers only. Your cluster will be available only for internal networks. +Select one of the existing networks or create a new one to attach it to your server. + +7\. (Optional) If you want to assign a reserved IP address to your server, turn on the **Use reserved IP** toggle and select one. For more details, refer to the article Create and configure a reserved IP address.  + +8\. Turn on the **Use floating IP** toggle if you want to use a floating IP address. It’ll make your server accessible from outside networks even if they have only a private interface. Create a new IP address or choose an existing one. For more details, check out the article Create and configure a floating IP address. + +Create an AI Cluster network settings + +9\. (Optional) If you need several network interfaces, click **Add Interface** and repeat the instructions from to Step 6. + +10\. Select one of your SSH keys from the list, add a new key, or generate a key pair. You'll use this SSH key to connect to your cluster. + +Create an AI Cluster ssh key settings + +11\. (Optional) To add metadata to your cluster, enable the **Additional options** toggle and add tags as key-value pairs. + +12\. Name your cluster and click **Create Cluster**. + +Create an AI Cluster tag and name settings + +You’ve successfully created the cluster. Use the IP address of your AI Cluster and the SSH key from Step 10 and connect to your server. + +User login: ```ubuntu``` + +Connection port: ```22``` \ No newline at end of file diff --git a/documentation/edge-ai/ai-infrastructure/metadata.md b/documentation/edge-ai/ai-infrastructure/metadata.md new file mode 100644 index 00000000..2796c2d2 --- /dev/null +++ b/documentation/edge-ai/ai-infrastructure/metadata.md @@ -0,0 +1,6 @@ +--- +title: metadata +displayName: GPU Cloud +published: true +order: 100 +--- From e43ed546745c92a89236e294aea817beb2648b77 Mon Sep 17 00:00:00 2001 From: Maria Yurlava Date: Wed, 30 Oct 2024 17:31:08 +0100 Subject: [PATCH 2/2] WEB-7257 Change doc name --- .../about-our-ai-infrastructure.md | 133 ------------------ .../ai-Infrustructure/about-virtual-vpod.md | 43 ------ .../ai-Infrustructure/create-an-ai-cluster.md | 56 -------- .../edge-ai/ai-Infrustructure/metadata.md | 6 - 4 files changed, 238 deletions(-) delete mode 100644 documentation/edge-ai/ai-Infrustructure/about-our-ai-infrastructure.md delete mode 100644 documentation/edge-ai/ai-Infrustructure/about-virtual-vpod.md delete mode 100644 documentation/edge-ai/ai-Infrustructure/create-an-ai-cluster.md delete mode 100644 documentation/edge-ai/ai-Infrustructure/metadata.md diff --git a/documentation/edge-ai/ai-Infrustructure/about-our-ai-infrastructure.md b/documentation/edge-ai/ai-Infrustructure/about-our-ai-infrastructure.md deleted file mode 100644 index 84167866..00000000 --- a/documentation/edge-ai/ai-Infrustructure/about-our-ai-infrastructure.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: about-our-ai-infrastructure -displayName: About GPU Cloud -order: 10 -published: true -toc: - --1--AI GPU infrastructure: "ai-gpu-infrastructure" - --1--Tools our AI Infrastructure supports: "tools-supported-by-gcore-gpu-cloud" -pageTitle: About Gcore GPU Cloud | Gcore -pageDescription: Explore Gcore GPU Cloud for AI. NVIDIA servers, top performance, diverse tool support. Easy deployment, per-minute billing. ---- -# GPU Cloud infrastructure - -Gcore GPU Cloud provides high-performance compute clusters designed for machine learning tasks. - -## AI GPU infrastructure - -Train your ML models with the latest NVIDIA GPUs. We offer a wide range of Bare Metal servers and Virtual Machines powered by NVIDIA A100, H100, and L40S GPUs. - -Pick the configuration and reservation plan that best fits your computing requirements. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SpecificationCharacteristicsUse casePerformance
H100 with Infiniband - 8x Nvidia H100 80GB
- 2 Intel Xeon 8480+
- 2TB RAM
- 2x 960GB
- 8x3.84 TB NVMe
- 3.2 Tbit/s Infiniband
- 2x100Gbit/s Ethernet -
- Optimized for distributed training of Large Language Models. - Ultimate performance for compute-intensive tasks that require a significant exchange of data by the network.
A100 with Infiniband - 8x Nvidia A100 80GB
- 2 Intel Xeon 8468
- 2 TB RAM
- 2x 960GB SSD
- 8x3.84 TB NVMe
- 800Gbit/s Infiniband -
- Distributed training for ML models and a broad range of HPC workloads. - Well-balanced in performance and price.
A100 without Infiniband - 8x Nvidia A100 80GB
- 2 Intel Xeon 8468
- 2 TB RAM
- 2x 960GB SSD
- 8x3.84 TB NVMe
- 2x100Gbit/s Ethernet -
- Training and fine-tuning of models on single nodes.
-
Inference for large models.
- Multi-user HPC cluster. -
The best solution for inference models that require more than 48GB vRAM.
L40 - 8x Nvidia L40S
- 2x Intel Xeon 8468
- 2TB RAM
- 4x7.68TB NVMe SSD
- 2x25Gbit/s Ethernet -
- Model inference.
-
Fine-tuning for small and medium-size models. -
The best solution for inference models that require less than 48GB vRAM.
- -Explore our competitive pricing on the AI GPU Cloud infrastructure pricing page. - -## Tools supported by Gcore GPU Cloud - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Tool classList of toolsExplanation
FrameworkTensorFlow, Keras, PyTorch, Paddle Paddle, ONNX, Hugging FaceYour model is supposed to use one of these frameworks for correct work.
Data platformsPostgreSQL, Hadoop, Spark, VertikaYou can set up a connection between our cluster and your data platforms of these types to make them work together.
Programming languagesJavaScript, R, Swift, PythonYour model is supposed to be written in one of these languages for correct work.
Resources for receiving and processing dataStorm, Spark, Kafka, PySpark, MS SQL, Oracle, MongoDBYou can set up a connection between our cluster and your resources of these types to make them work together.
Exploration and visualization toolsSeaborn, Matplotlib, TensorBoardYou can connect our cluster to these tools to visualize your model.
diff --git a/documentation/edge-ai/ai-Infrustructure/about-virtual-vpod.md b/documentation/edge-ai/ai-Infrustructure/about-virtual-vpod.md deleted file mode 100644 index 9b0e26b5..00000000 --- a/documentation/edge-ai/ai-Infrustructure/about-virtual-vpod.md +++ /dev/null @@ -1,43 +0,0 @@ ---- -title: about-virtual-vpod -displayName: Virtual vPOD -order: 30 -published: true -toc: - --1--What is a virtual vPOD?: "what-is-a-virtual-vpod" - --1--Features: "features-of-virtual-vpods" - --1--Benefits: "benefits-of-virtual-vpods" - --1--Configurations: "configurations" -pageTitle: Virtual vPOD | Gcore -pageDescription: Discover virtual vPODs. AI clusters with Poplar servers on virtual machines, offering cost savings, faster deployment, and flexible configurations. ---- -# About Virtual vPOD - -## What is a virtual vPOD? - -A virtual vPOD is a flavor of an AI cluster in which a Poplar server is deployed on a Virtual Machine, as opposed to a dedicated vPOD, which deploys a Poplar server on a dedicated Bare Metal server. - -With virtual vPODs, you can directly access the host machines and can easily set up your own development environment on each IPU instance, install and run any code in an ultrafast connection with IPU accelerators, have better experience with deploying and developing such frameworks like TensorFlow and PyTorch on Cloud IPUs, use ephemeral storage, execute custom code in input pipelines, and integrate Cloud IPUs into research and production workflows. - -## Features of virtual vPODs - -Virtual vPODs offer two main features. - -1. **External volumes**. With virtual vPODs, you can connect external block storage for system and data volumes and easily attach new data volumes.  -2. **Suspension mode**. Virtual vPODs have the Suspension mode, which allows you to avoid any charges when your cluster is stopped. This feature is particularly useful when you have temporary or unpredictable workloads or when you want to make changes to your cluster. When a cluster is suspended, its state is saved on external storage. You can resume the cluster within a few minutes, and it’ll be restored to its previous state. This feature allows for better control over costs and resource optimization. - -## Benefits of virtual vPODs - -1. **Cost savings**. With the suspension mode, users can save money by temporary pausing their resources when they are not in use. -2. **Faster deployment time**. Virtual vPODs are deployed in just 5 minutes, compared to 15 minutes required for physical vPODs. -3. **Greater storage options**. With Virtual vPODs, users can easily attach external data volumes. -4. **Flexibility**. Virtual vPODs can be easily modified or reconfigured as needed to meet changing requirements. - -## Configurations - -Each flavor of virtual vPOD comes with: - -- 1 Virtual Machine. The configuration of a Virtual Machine depends on the capacity of vCPU, RAM and ephemeral storage. -- Host server(s) with 4 IPU-processors on each. The exact number of host servers depends on the flavor you choose. - -For up-to-date prices and availability, refer to our website or your Customer Portal. \ No newline at end of file diff --git a/documentation/edge-ai/ai-Infrustructure/create-an-ai-cluster.md b/documentation/edge-ai/ai-Infrustructure/create-an-ai-cluster.md deleted file mode 100644 index fe936132..00000000 --- a/documentation/edge-ai/ai-Infrustructure/create-an-ai-cluster.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: create-an-ai-cluster -displayName: Create an AI Cluster -order: 20 -published: true -toc: -pageTitle: Create an AI Cluster | Gcore -pageDescription: Learn how to create an AI cluster using Gcore's Cloug GPU infrastructure. Follow the step-by-step guide to set up your cluster and start using it. ---- -# Create an AI Cluster - -1\. In the Gcore Customer Portal, open the **GPU cloud** page. You'll be taken to the page for AI cluster creation. - -Create an AI Cluster - -2\. Select a region, which is a physical location of the data center. For example, if you choose Manassas, your cluster will be deployed on servers in Manassas. - -3\. Choose the flavor with relevant cluster configuration and allocated resources. The number in vPOD means the number of IPU-processors in your cluster. For instance, one Graphcore server consists of four IPU-processors. - -4\. Select the OS image on which your model will be running. - -Create an AI Cluster image settings - -5\. Configure volumes and set the size of your cluster. Note that you can't change the cluster size after its creation. - -Create an AI Cluster volume settings - -6\. Set up a network interface. You can choose a public or private one: - * **Public**: Attach this interface if you are planning to use the GPU Cloud with servers hosted outside of Gcore Cloud. Your cluster will be accessible from external networks. - - * **Private**: If you want to use the service with Gcore servers only. Your cluster will be available only for internal networks. -Select one of the existing networks or create a new one to attach it to your server. - -7\. (Optional) If you want to assign a reserved IP address to your server, turn on the **Use reserved IP** toggle and select one. For more details, refer to the article Create and configure a reserved IP address.  - -8\. Turn on the **Use floating IP** toggle if you want to use a floating IP address. It’ll make your server accessible from outside networks even if they have only a private interface. Create a new IP address or choose an existing one. For more details, check out the article Create and configure a floating IP address. - -Create an AI Cluster network settings - -9\. (Optional) If you need several network interfaces, click **Add Interface** and repeat the instructions from to Step 6. - -10\. Select one of your SSH keys from the list, add a new key, or generate a key pair. You'll use this SSH key to connect to your cluster. - -Create an AI Cluster ssh key settings - -11\. (Optional) To add metadata to your cluster, enable the **Additional options** toggle and add tags as key-value pairs. - -12\. Name your cluster and click **Create Cluster**. - -Create an AI Cluster tag and name settings - -You’ve successfully created the cluster. Use the IP address of your AI Cluster and the SSH key from Step 10 and connect to your server. - -User login: ```ubuntu``` - -Connection port: ```22``` \ No newline at end of file diff --git a/documentation/edge-ai/ai-Infrustructure/metadata.md b/documentation/edge-ai/ai-Infrustructure/metadata.md deleted file mode 100644 index 2796c2d2..00000000 --- a/documentation/edge-ai/ai-Infrustructure/metadata.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: metadata -displayName: GPU Cloud -published: true -order: 100 ----