-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1399 from G-Core/WEB-7257-fix
WEB-7257 fix bug
- Loading branch information
Showing
14 changed files
with
866 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
--- | ||
title: inference-at-the-edge | ||
displayName: Inference at the Edge | ||
order: 10 | ||
published: true | ||
toc: | ||
--1--What is Inference at the Edge?: "what-is-gcore-inference-at-the-edge" | ||
--1--Getting started: "getting-started" | ||
--1--How Inference at the Edge works: "how-inference-at-the-edge-works" | ||
--1--Use cases: "use-cases" | ||
--1--Key benefits: "key-benefits" | ||
--1--Supported features: "supported-features" | ||
pageTitle: About Inference at the Edge | Gcore | ||
pageDescription: Explore Gcore Inference at the Edge infrastructure. Deploy custom AI models or select from our model catalog. | ||
--- | ||
# About Inference at the Edge | ||
|
||
The development of machine learning involves two main stages: training and inference. | ||
|
||
In the first stage, an AI model is trained on big data, like an array of images, to recognize and label objects. This results in a trained model. | ||
|
||
The second stage is model inference, where the model is used to make predictions from real user requests. For this stage, it’s crucial that the AI model can respond promptly to users regardless of network delays, latency, and distance from data centers. | ||
|
||
<a href="https://gcore.com/docs/cloud/ai-Infrustructure/about-our-ai-infrastructure" target="_blank">Gcore GPU Cloud</a> is designed for creating and training models. For inference, we offer Gcore Inference at the Edge. | ||
|
||
## What is Gcore Inference at the Edge? | ||
|
||
Gcore Inference at the Edge allows customers to deploy trained AI models on edge inference nodes. By bringing AI models closer to end users, the technology ensures ultra-fast response times and optimized performance. | ||
|
||
Using Anycast endpoints, end users' queries are directed to the nearest running model, resulting in low latency and an enhanced user experience. This setup is automated through a single endpoint, relieving you of the need to manage, scale, and monitor the underlying infrastructure. | ||
|
||
## Getting started | ||
|
||
Deploy AI models with our global intelligence pipeline—a comprehensive ecosystem that supports the full AI lifecycle, from training to inference. It ensures seamless development, deployment, and operation of AI models at various scales across multiple regions. | ||
|
||
To get started, check out our guide on <a href="https://gcore.com/docs/cloud/inference-at-the-edge/deploy-ai-model" target="_blank">deploying a model</a>. | ||
|
||
## How Inference at the Edge works | ||
|
||
Inference at the Edge combines two technologies: | ||
|
||
1\. **Edge Network**: Provides low latency via Anycast balancing and smart routing. | ||
|
||
2\. **Serverless flexible GPU infrastructure**: Enables quick initiation, integration, and deployment. | ||
|
||
We provide you with an endpoint that can be integrated into your applications. When your users access this endpoint, their requests are delivered to the nearest Edge nodes. This is achieved through Smart Routing technology, which redirects requests to the closest inference region where the trained model is deployed. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/about-inference-at-the-edge/smart-routing-diagram.png" alt="Diagram depicting Smart Routing technology" width="60%"> | ||
|
||
We also use <a href="https://gcore.com/docs/dns/dns-failover/about-dns-failover" target="_blank">Healthchecks</a> to monitor the availability of pods. If the Amsterdam-1 pod is experiencing downtime, the request will be automatically sent to the geographically closest inference region, such as Amsterdam-2. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/about-inference-at-the-edge/smart-routing-map.png" alt="<Map depicting Smart Routing across locations" width="60%"> | ||
|
||
## Use cases | ||
|
||
Inference at the Edge is a versatile solution for businesses that require low-latency or real-time model responses. It caters to various industries, including: | ||
|
||
* **Fintech and banking**: Enables prompt anti-fraud detection and real-time credit scoring. | ||
|
||
* **Healthcare**: Facilitates medical prescriptions based on data from wearable sensors and the analysis of medical data | ||
|
||
* **Gaming**: Supports automatic opponent selection in competitive games, map generation, and maintaining open worlds. | ||
|
||
* **Media**: Provides content analysis, automated transcribing, and translating of interviews. | ||
|
||
* **ISP and internet services**: Offers AI-based traffic analysis and DDoS protection. | ||
|
||
* **Industrial and manufacturing**: Ensures real-time defect detection and fast feedback. | ||
|
||
## Key benefits | ||
|
||
Inference at the Edge offers several key benefits: | ||
|
||
* **Low latency**: With over 180 points of presence worldwide, requests are transferred quickly to the nearest Inference at the Edge pod, ensuring low latency for users. | ||
|
||
* **Flexibility in model selection**. Run leading open-source models from our <a href="https://gcore.com/docs/cloud/inference-at-the-edge#ai-models" target="_blank">model catalog</a> or deploy your own custom models. | ||
|
||
* **High performance**: Utilizing the latest, purpose-built NVIDIA GPU hardware, Inference at the Edge delivers fast model inference capable of handling the most demanding workloads. | ||
|
||
* **Cost efficiency**: Payments are based solely on the runtime of the containers, which automatically scale in and out based on the number of user requests to keep your operations cost-effective. | ||
|
||
* **Easy control**: Global AI infrastructure can be configured with just a few clicks in the Gcore Customer Portal or by API requests, simplifying management and control. | ||
|
||
## Supported features | ||
|
||
* Model catalog | ||
|
||
* Custom model deployment | ||
|
||
* Various flavors (vGPU/vCPU/RAM) and storage | ||
|
||
* DDoS and bot protection | ||
|
||
* API keys | ||
|
||
* REST API & Terraform (coming soon) | ||
|
||
* RAG support (coming soon) | ||
|
||
## AI models | ||
|
||
The following are the foundational open-source models available in our AI model catalog. | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th style="text-align: left"><strong>Model</strong></th> | ||
<th style="text-align: left"><strong>Description</strong></th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td style="text-align: left">DistilBERT</td> | ||
<td style="text-align: left"> | ||
A light version of the BERT language model for generating short text extracts. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">LLaMA-Pro</td> | ||
<td style="text-align: left"> | ||
A large language model (LLM) for understanding general language and domain-specific areas, particularly programming and mathematics. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Mistral-7B</td> | ||
<td style="text-align: left"> | ||
An LLM that can generate human-quality text, write code, summarize text, and answer questions. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">ResNet-50</td> | ||
<td style="text-align: left"> | ||
A deep learning neural model used in computer vision tasks and known for its ability to train networks effectively. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Stable Diffusion XL</td> | ||
<td style="text-align: left">1 model for generating images based on text descriptions. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: left">Whisper</td> | ||
<td style="text-align: left"> | ||
An automatic speech recognition model for converting spoken language into written text. | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> |
38 changes: 38 additions & 0 deletions
38
documentation/edge-ai/inference-at-the-edge/create-and-configure-a-registry.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
title: create-and-configure-a-registry | ||
displayName: Create and configure a registry | ||
published: true | ||
order: 40 | ||
pageTitle: Add and configure a registry | Gcore | ||
pageDescription: Learn how to set up a registry with your AI model in the Gcore Customer Portal for Gcore Inference at the EDge | ||
--- | ||
# Add and configure a registry | ||
|
||
If you want to deploy a custom AI model with Gcore Inference at the Edge, you need to provide information about the registry where your model is stored. This is necessary to ensure that we can access and retrieve your model during the deployment process. | ||
|
||
You can set up a registry either <a href="https://gcore.com/docs/cloud/inference-at-the-edge/deploy-ai-model" target="_blank">during AI model deployment</a> or on the **Registries** page. The latter approach is described in this guide. | ||
|
||
## Add a registry | ||
|
||
1\. In the Gcore Customer Portal, navigate to **Cloud** > **Inference at the Edge**. | ||
|
||
2\. Click **Registries**. | ||
|
||
3\. Click **Add registry**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/add-a-registry/registry-list.png" alt="Registries page with highlighted Add registry button" width="80%"> | ||
|
||
4\. Give your registry a name consisting of lowercase Latin characters, which can be separated by dashes. | ||
|
||
5\. Provide the link to the location where your AI model is stored. We’ll use this URL to retrieve the model during deployment. | ||
|
||
6\. Specify the username you use to access the storage location of your AI model. | ||
|
||
7\. Enter the password required to access the model. | ||
|
||
8\. Click **Add**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/add-a-registry/configure-registry.png" alt="Add registry dialog with registry configuration options" width="80%"> | ||
|
||
You’ve successfully configured a registry. | ||
|
117 changes: 117 additions & 0 deletions
117
documentation/edge-ai/inference-at-the-edge/create-and-manage-api-keys.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
--- | ||
title: create-and-manage-api-keys | ||
displayName: Create and manage API keys | ||
published: true | ||
order: 30 | ||
toc: | ||
--1—-Create an API key: "create-an-api-key" | ||
--1--Manage API keys: "manage-api-keys" | ||
--2--Edit API key: "edit-api-key" | ||
--2--Delete API key: "delete-api-key" | ||
pageTitle: Create and Manage API Keys | Gcore | ||
pageDescription: Learn how to create API keys and attach them to Gcore Inference at the Edge deployments. | ||
--- | ||
# Create and manage API keys | ||
|
||
Setting up API keys protects deployed AI models from unauthorized access. | ||
|
||
You can add multiple API keys to a single deployment, and the same API key can be attached to multiple deployments. | ||
|
||
## Create an API key | ||
|
||
You can create an API key in different ways: <a href="https://gcore.com/docs/cloud/inference-at-the-edge/deploy-ai-model" target="_blank">during AI model deployment</a>, <a href="https://gcore.com/docs/cloud/inference-at-the-edge/manage-deployments" target="_blank">via a deployed AI model's settings</a>, or on the **API keys** page. Here, we explain the latter approach. | ||
|
||
To create an API key and add it to the deployment: | ||
|
||
1\. In the Gcore Customer Portal, navigate to **Cloud** > **Inference at the Edge**. | ||
|
||
2\. Click **API keys**. | ||
|
||
3\. Click **Create API key**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/create-api-key.png" alt="API keys page with highlighted Add api key button" width="80%"> | ||
|
||
4\. In the **General** section, specify the API key name. Optionally, add a description. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/general-tab-keys.png" alt="General section with key name and description" width="80%"> | ||
|
||
5\. In the **Inference instances** dropdown, select one or more deployments for which this key will be required for authentication. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/instances-tab.png" alt="Inference instances section with instance dropdown" width="80%"> | ||
|
||
6\. In the **Expiration** section, select for how long the key will be valid: | ||
|
||
* **Never expire**: The key will remain valid indefinitely. | ||
|
||
* **Set an expiration date**: After the specified date, the key will no longer grant access to the attached deployments. By default, the key expires at 00:00 UTC on the specified date. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/expiration-tab.png" alt="Expiration section date options" width="80%"> | ||
|
||
7\. Click **Create**. | ||
|
||
8\. Copy the key and save it locally. | ||
|
||
9\. Click **OK, I’ve copied API Key**. | ||
|
||
The key has been successfully created. | ||
|
||
<alert-element type="warning" title="Warning"> | ||
|
||
Never share your API key with third parties. This might result in unauthorized access to your deployments. | ||
|
||
</alert-element> | ||
|
||
## Manage API keys | ||
|
||
You can view detailed information about an API key, change the deployments where it's used for authentication, modify the expiration date, or delete the key from the Gcore Customer Portal. | ||
|
||
### Edit API key | ||
|
||
1\. In the Gcore Customer Portal, navigate to **Cloud** > **Inference at the Edge**. | ||
|
||
2\. Click **API keys**. | ||
|
||
3\. Find the key you want to edit and click the three-dot icon to open the settings menu. | ||
|
||
4\. Click **Edit**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/edit-api-key.png" alt="api key settings with highlighted edit button" width="80%"> | ||
|
||
A new page with the key overview will open. To check a particular functionality, navigate to the relevant tab. | ||
|
||
#### General | ||
|
||
In this tab, you can update the key name and description. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/general-tab-keys.png" alt="General tab with options to edit key name and description" width="80%"> | ||
|
||
#### Inference instances | ||
|
||
In this tab, you can add or remove deployments where this API key will be required to authenticate. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/instances-tab-keys.png" alt="Instances tab with dropdpwn tp add instances" width="80%"> | ||
|
||
#### Expiration | ||
|
||
If your key is close to expiring, you can modify the expiry date on this tab, ensuring that the key remains a valid authentication method. Alternatively, you can choose the option **Never expire** to keep the key valid indefinitely. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/expiration-tab-keys.png" alt="Expiration tab with options to change expiration date" width="80%"> | ||
|
||
### Delete API key | ||
|
||
1\. In the Gcore Customer Portal, navigate to **Cloud** > **AI infrastructure**. | ||
|
||
2\. Open the **Inference at the Edge** page and click **API keys**. | ||
|
||
3\. Find the key you want to remove and click the three-dot icon to open the settings menu. | ||
|
||
4\. Click **Delete**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/delete-api-key.png" alt="api key settings with highlighted delete button" width="80%"> | ||
|
||
5\. Confirm your action by clicking **Delete API key**. | ||
|
||
<img src="https://assets.gcore.pro/docs/cloud/inference-at-the-edge/create-and-manage-api-keys/verify-key-deletion.png" alt="Delete key confirmation dialog" width="80%"> | ||
|
||
Your API key has been successfully removed. | ||
|
Oops, something went wrong.