-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1396 from G-Core/WEB-7257-fix
WEB-7257 fix bug
- Loading branch information
Showing
11 changed files
with
276 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
133 changes: 133 additions & 0 deletions
133
documentation/edge-ai/ai-infrastructure/about-our-ai-infrastructure.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
--- | ||
title: about-our-ai-infrastructure | ||
displayName: About GPU Cloud | ||
order: 10 | ||
published: true | ||
toc: | ||
--1--AI GPU infrastructure: "ai-gpu-infrastructure" | ||
--1--Tools our AI Infrastructure supports: "tools-supported-by-gcore-gpu-cloud" | ||
pageTitle: About Gcore GPU Cloud | Gcore | ||
pageDescription: Explore Gcore GPU Cloud for AI. NVIDIA servers, top performance, diverse tool support. Easy deployment, per-minute billing. | ||
--- | ||
# GPU Cloud infrastructure | ||
|
||
Gcore <a href="https://gcore.com/cloud/ai-gpu" target="_blank">GPU Cloud</a> provides high-performance compute clusters designed for machine learning tasks. | ||
|
||
## AI GPU infrastructure | ||
|
||
Train your ML models with the latest <a href="https://www.nvidia.com/en-us/data-center/data-center-gpus/" taget="_blank">NVIDIA GPUs</a>. We offer a wide range of Bare Metal servers and Virtual Machines powered by NVIDIA A100, H100, and L40S GPUs. | ||
|
||
Pick the configuration and reservation plan that best fits your computing requirements. | ||
|
||
<table> | ||
<tr> | ||
<th style="width:20%">Specification</th> | ||
<th style="width:35%">Characteristics</th> | ||
<th style="width:23%">Use case</th> | ||
<th style="width:22%">Performance</th> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">H100 with Infiniband</td> | ||
<td style="text-align: left"> | ||
8x Nvidia H100 80GB <br> | ||
2 Intel Xeon 8480+ <br> | ||
2TB RAM <br> | ||
2x 960GB <br> | ||
8x3.84 TB NVMe <br> | ||
3.2 Tbit/s Infiniband <br> | ||
2x100Gbit/s Ethernet | ||
</td> | ||
<td style="text-align: left"> | ||
Optimized for distributed training of Large Language Models. | ||
</td> | ||
<td style="text-align: left">Ultimate performance for compute-intensive tasks that require a significant exchange of data by the network.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">A100 with Infiniband</td> | ||
<td style="text-align: left"> | ||
8x Nvidia A100 80GB <br> | ||
2 Intel Xeon 8468 <br> | ||
2 TB RAM <br> | ||
2x 960GB SSD <br> | ||
8x3.84 TB NVMe <br> | ||
800Gbit/s Infiniband | ||
</td> | ||
<td style="text-align: left"> | ||
Distributed training for ML models and a broad range of HPC workloads. | ||
</td> | ||
<td style="text-align: left">Well-balanced in performance and price.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">A100 without Infiniband</td> | ||
<td style="text-align: left"> | ||
8x Nvidia A100 80GB <br> | ||
2 Intel Xeon 8468 <br> | ||
2 TB RAM <br> | ||
2x 960GB SSD <br> | ||
8x3.84 TB NVMe <br> | ||
2x100Gbit/s Ethernet | ||
</td> | ||
<td style="text-align: left"> | ||
Training and fine-tuning of models on single nodes. <br> | ||
<br>Inference for large models.<br> | ||
Multi-user HPC cluster. | ||
</td> | ||
<td style="text-align: left">The best solution for inference models that require more than 48GB vRAM.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">L40</td> | ||
<td style="text-align: left"> | ||
8x Nvidia L40S <br> | ||
2x Intel Xeon 8468 <br> | ||
2TB RAM <br> | ||
4x7.68TB NVMe SSD <br> | ||
2x25Gbit/s Ethernet | ||
</td> | ||
<td style="text-align: left"> | ||
Model inference.<br> | ||
<br>Fine-tuning for small and medium-size models. | ||
</td> | ||
<td style="text-align: left">The best solution for inference models that require less than 48GB vRAM.</td> | ||
</tr> | ||
</table> | ||
|
||
Explore our competitive pricing on the <a href="https://gcore.com/cloud/ai-gpu" target="_blank">AI GPU Cloud infrastructure pricing page</a>. | ||
|
||
## Tools supported by Gcore GPU Cloud | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th style="text-align: left"><strong>Tool class</strong></th> | ||
<th style="text-align: left"><strong>List of tools</strong></th> | ||
<th style="text-align: left"><strong>Explanation</strong></th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td style="text-align: left">Framework</td> | ||
<td style="text-align: left">TensorFlow, Keras, PyTorch, Paddle Paddle, ONNX, Hugging Face</td> | ||
<td style="text-align: left">Your model is supposed to use one of these frameworks for correct work.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Data platforms</td> | ||
<td style="text-align: left">PostgreSQL, Hadoop, Spark, Vertika</td> | ||
<td style="text-align: left">You can set up a connection between our cluster and your data platforms of these types to make them work together.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Programming languages</td> | ||
<td style="text-align: left">JavaScript, R, Swift, Python</td> | ||
<td style="text-align: left">Your model is supposed to be written in one of these languages for correct work.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Resources for receiving and processing data</td> | ||
<td style="text-align: left">Storm, Spark, Kafka, PySpark, MS SQL, Oracle, MongoDB</td> | ||
<td style="text-align: left">You can set up a connection between our cluster and your resources of these types to make them work together.</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Exploration and visualization tools</td> | ||
<td style="text-align: left">Seaborn, Matplotlib, TensorBoard</td> | ||
<td style="text-align: left">You can connect our cluster to these tools to visualize your model.</td> | ||
</tr> | ||
</tbody> | ||
</table> |
43 changes: 43 additions & 0 deletions
43
documentation/edge-ai/ai-infrastructure/about-virtual-vpod.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
title: about-virtual-vpod | ||
displayName: Virtual vPOD | ||
order: 30 | ||
published: true | ||
toc: | ||
--1--What is a virtual vPOD?: "what-is-a-virtual-vpod" | ||
--1--Features: "features-of-virtual-vpods" | ||
--1--Benefits: "benefits-of-virtual-vpods" | ||
--1--Configurations: "configurations" | ||
pageTitle: Virtual vPOD | Gcore | ||
pageDescription: Discover virtual vPODs. AI clusters with Poplar servers on virtual machines, offering cost savings, faster deployment, and flexible configurations. | ||
--- | ||
# About Virtual vPOD | ||
|
||
## What is a virtual vPOD? | ||
|
||
A virtual vPOD is a flavor of an AI cluster in which a Poplar server is deployed on a Virtual Machine, as opposed to a dedicated vPOD, which deploys a Poplar server on a dedicated Bare Metal server. | ||
|
||
With virtual vPODs, you can directly access the host machines and can easily set up your own development environment on each IPU instance, install and run any code in an ultrafast connection with IPU accelerators, have better experience with deploying and developing such frameworks like TensorFlow and PyTorch on Cloud IPUs, use ephemeral storage, execute custom code in input pipelines, and integrate Cloud IPUs into research and production workflows. | ||
|
||
## Features of virtual vPODs | ||
|
||
Virtual vPODs offer two main features. | ||
|
||
1. **External volumes**. With virtual vPODs, you can connect external block storage for system and data volumes and easily attach new data volumes. | ||
2. **Suspension mode**. Virtual vPODs have the Suspension mode, which allows you to avoid any charges when your cluster is stopped. This feature is particularly useful when you have temporary or unpredictable workloads or when you want to make changes to your cluster. When a cluster is suspended, its state is saved on external storage. You can resume the cluster within a few minutes, and it’ll be restored to its previous state. This feature allows for better control over costs and resource optimization. | ||
|
||
## Benefits of virtual vPODs | ||
|
||
1. **Cost savings**. With the suspension mode, users can save money by temporary pausing their resources when they are not in use. | ||
2. **Faster deployment time**. Virtual vPODs are deployed in just 5 minutes, compared to 15 minutes required for physical vPODs. | ||
3. **Greater storage options**. With Virtual vPODs, users can easily attach external data volumes. | ||
4. **Flexibility**. Virtual vPODs can be easily modified or reconfigured as needed to meet changing requirements. | ||
|
||
## Configurations | ||
|
||
Each flavor of virtual vPOD comes with: | ||
|
||
- 1 Virtual Machine. The configuration of a Virtual Machine depends on the capacity of vCPU, RAM and ephemeral storage. | ||
- Host server(s) with 4 IPU-processors on each. The exact number of host servers depends on the flavor you choose. | ||
|
||
For up-to-date prices and availability, refer to <a href="https://gcore.com/cloud/ai-platform" target="_blank">our website</a> or your <a href="https://cloud.gcore.com/cloud/projects/list" target="_blank">Customer Portal</a>. |
55 changes: 55 additions & 0 deletions
55
documentation/edge-ai/ai-infrastructure/create-an-ai-cluster.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: create-an-ai-cluster | ||
displayName: Create an AI Cluster | ||
order: 20 | ||
published: true | ||
pageTitle: Create an AI Cluster | Gcore | ||
pageDescription: Learn how to create an AI cluster using Gcore's Cloug GPU infrastructure. Follow the step-by-step guide to set up your cluster and start using it. | ||
--- | ||
# Create an AI Cluster | ||
|
||
1\. In the <a href="https://accounts.gcore.com/reports/dashboard" target="_blank">Gcore Customer Portal</a>, open the **GPU cloud** page. You'll be taken to the page for AI cluster creation. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/gpu-cloud-page.png" alt="Create an AI Cluster" width="80%"> | ||
|
||
2\. Select a region, which is a physical location of the data center. For example, if you choose Manassas, your cluster will be deployed on servers in Manassas. | ||
|
||
3\. Choose the flavor with relevant cluster configuration and allocated resources. The number in vPOD means the number of IPU-processors in your cluster. For instance, one Graphcore server consists of four IPU-processors. | ||
|
||
4\. Select the OS <a href="https://gcore.com/docs/cloud/images/about-images" target="_blank">image</a> on which your model will be running. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/create-ai-cluster-image.png" alt="Create an AI Cluster image settings" width="80%"> | ||
|
||
5\. Configure volumes and set the size of your cluster. Note that you can't change the cluster size after its creation. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/create-ai-cluster-volumes.png" alt="Create an AI Cluster volume settings" width="80%"> | ||
|
||
6\. Set up a <a href="ps://gcore.com/docs/cloud/networking/create-and-manage-a-network" target="_blank">network interface</a>. You can choose a public or private one: | ||
* **Public**: Attach this interface if you are planning to use the GPU Cloud with servers hosted outside of Gcore Cloud. Your cluster will be accessible from external networks. | ||
|
||
* **Private**: If you want to use the service with Gcore servers only. Your cluster will be available only for internal networks. | ||
Select one of the existing networks or create a new one to attach it to your server. | ||
|
||
7\. (Optional) If you want to assign a reserved IP address to your server, turn on the **Use reserved IP** toggle and select one. For more details, refer to the article <a href="https://gcore.com/docs/cloud/networking/ip-address/create-and-configure-a-reserved-ip-address" target="_blank">Create and configure a reserved IP address</a>. | ||
|
||
8\. Turn on the **Use floating IP** toggle if you want to use a floating IP address. It’ll make your server accessible from outside networks even if they have only a private interface. Create a new IP address or choose an existing one. For more details, check out the article <a href="https://gcore.com/docs/cloud/networking/ip-address/create-and-configure-a-floating-ip-address" target="_blank">Create and configure a floating IP address</a>. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/create-ai-cluster-network.png" alt="Create an AI Cluster network settings" width="80%"> | ||
|
||
9\. (Optional) If you need several network interfaces, click **Add Interface** and repeat the instructions from to Step 6. | ||
|
||
10\. Select one of your SSH keys from the list, add a new key, or generate a key pair. You'll use this SSH key to connect to your cluster. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/create-ai-cluster-ssh-key.png" alt="Create an AI Cluster ssh key settings" width="80%"> | ||
|
||
11\. (Optional) To add metadata to your cluster, enable the **Additional options** toggle and add tags as key-value pairs. | ||
|
||
12\. Name your cluster and click **Create Cluster**. | ||
|
||
<img src="https://assets.gcore.pro/docs/gpu-cloud/create-ai-cluster-tags-name.png" alt="Create an AI Cluster tag and name settings" width="80%"> | ||
|
||
You’ve successfully created the cluster. Use the IP address of your AI Cluster and the SSH key from Step 10 and connect to your server. | ||
|
||
User login: ```ubuntu``` | ||
|
||
Connection port: ```22``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
title: metadata | ||
displayName: GPU Cloud | ||
published: true | ||
order: 100 | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.