Skip to content

Commit

Permalink
Add private deployment section (#384)
Browse files Browse the repository at this point in the history
* add private deployment section

* update vpc explainer
  • Loading branch information
mrmer1 authored Feb 12, 2025
1 parent 540ebad commit fb625e0
Show file tree
Hide file tree
Showing 8 changed files with 864 additions and 48 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Private Deployment Usage"
slug: "docs/private-deployment-usage"

hidden: false

description: "This page describes how to use Cohere's SDK to access privately deployed Cohere models."
image: "../../../assets/images/f1cc130-cohere_meta_image.jpg"
keywords: "generative AI, large language models, private deployment"

createdAt: "Mon Apr 08 2024 14:53:59 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Wed May 01 2024 16:11:36 GMT+0000 (Coordinated Universal Time)"
---

You can use Cohere's SDK to access privately deployed Cohere models.

## Installation

To install the Cohere SDK, choose from the following 4 languages:

<Tabs>
<Tab title="Python">

```bash
pip install -U cohere
```
[Source](https://github.com/cohere-ai/cohere-python)
</Tab>

<Tab title="TypeScript">

```bash
npm i -s cohere-ai
```
[Source](https://github.com/cohere-ai/cohere-typescript)

</Tab>

<Tab title="Java">

```gradle
implementation 'com.cohere:cohere-java:1.x.x'
```
[Source](https://github.com/cohere-ai/cohere-java)

</Tab>

<Tab title="Go">

```bash
go get github.com/cohere-ai/cohere-go/v2
```

[Source](https://github.com/cohere-ai/cohere-go)

</Tab>
</Tabs>

## Getting Started

The only difference between using Cohere's models on private deployments and the Cohere platform is how you set up the client. With private deployments, you need to pass the following parameters:
- `api_key` - Pass a blank value
- `base_url` - Pass the URL of your private deployment

```python PYTHON
import cohere

co = cohere.Client(
api_key="", # Leave this blank
base_url="<YOUR_DEPLOYMENT_URL>"
)
```

To get started with example use cases, refer to the following quickstart examples:
- [Text Generation (Command model)](https://docs.cohere.com/docs/text-gen-quickstart)
- [RAG (Command model)](https://docs.cohere.com/docs/rag-quickstart)
- [Tool Use (Command model)](https://docs.cohere.com/docs/tool-use-quickstart)
- [Semantic Search (Embed model)](https://docs.cohere.com/docs/sem-search-quickstart)
- [Reranking (Rerank model)](https://docs.cohere.com/docs/reranking-quickstart)

## Integrations

You can use the LangChain library with privately deployed Cohere models. Refer to the [LangChain section](https://docs.cohere.com/docs/chat-on-langchain#using-langchain-on-private-deployments) for more information on setting up LangChain for private deployments.
68 changes: 68 additions & 0 deletions fern/pages/v2/deployment-options/deployment-options-overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: "Deployment Options - Overview"
slug: "v2/docs/deployment-options-overview"

hidden: false

description: "This page provides an overview of the available options for deploying Cohere's models."
image: "../../../assets/images/f1cc130-cohere_meta_image.jpg"
keywords: "generative AI, large language models, private deployment"

createdAt: "Mon Apr 08 2024 14:53:59 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Wed May 01 2024 16:11:36 GMT+0000 (Coordinated Universal Time)"
---
The most common way to access Cohere’s large language models (LLMs) is through the Cohere platform, which is fully managed by Cohere and accessible through an API.

But that’s not the only way to access Cohere’s models. In an enterprise setting, organizations might require more control over where and how the models are hosted.

Specifically, Cohere offers four types of deployment options.
1. **Cohere Platform**
2. **Cloud AI Services**
3. **Private Deployments - Cloud**
4. **Private Deployments - On-Premises**

## Cohere platform

This is the fastest and easiest way to start using Cohere’s models. The models are hosted on Cohere infrastructure and available on our public SaaS platform (which provides an API data opt-out), which is fully managed by Cohere.

## Cloud AI services

These managed services enable enterprises to access Cohere’s models on cloud AI services. In this scenario, Cohere’s models are hosted on the cloud provider’s infrastructure. Cohere is cloud-agnostic, meaning you can deploy our models through any cloud provider.

### AWS

Developers can access a range of Cohere’s language models in a private environment via Amazon’s AWS Cloud platform. Cohere’s models are supported on two Amazon services: **Amazon Bedrock** and **Amazon SageMaker**.

#### Amazon Bedrock

Amazon Bedrock is a fully managed service where foundational models from Cohere are made available through a single, serverless API. [Read about Bedrock here](http://docs.aws.amazon.com/bedrock).

[View Cohere’s models on Amazon Bedrock](https://aws.amazon.com/bedrock/cohere/).

#### Amazon SageMaker

Amazon SageMaker is a service that allows customers to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. [Read about SageMaker here.](https://aws.amazon.com/pm/sagemaker/)

Cohere offers a comprehensive suite of generative and embedding models through SageMaker on a range of hardware options, many of which support finetuning for deeper customization and performance.

[View Cohere's model listing on the AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=87af0c85-6cf9-4ed8-bee0-b40ce65167e0).

### Azure AI Foundry

Azure AI Foundry is a platform that is designed for developers to build generative AI applications on an enterprise-grade platform. Developers can explore a wide range of models, services, and capabilities to build AI applications that meet their specific goals.

[View Cohere’s models on Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-cohere-command).

### OCI Generative AI Service

Oracle Cloud Infrastructure Generative AI is a fully managed service that enables you to use Cohere's [generative](https://docs.oracle.com/en-us/iaas/Content/generative-ai/generate-models.htm) and [embedding models](https://docs.oracle.com/en-us/iaas/Content/generative-ai/embed-models.htm) through an API.

## Private peployments

### Cloud (VPC)

Private deployments (cloud) allow enterprises to deploy the Cohere stack privately on cloud platforms. With AWS, Cohere’s models can be deployed in an enterprise’s AWS Cloud environment via their own VPC (EC2, EKS). Compared to managed cloud services, VPC deployments provide tighter control and compliance. No egress is another common reason for going with VPCs. Overall, the VPC option has a higher management burden but offers more flexibility.

### On-premises

Private deployments on-premises (on-prem) allow enterprises to deploy the Cohere stack privately on their own compute. This includes air-gapped environments where systems are physically isolated from unsecured networks, providing maximum security for sensitive workloads.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: "Private Deployment Overview"
slug: "v2/docs/private-deployment-overview"

hidden: false

description: "This page provides an overview of private deployments of Cohere's models."
image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg"
keywords: "generative AI, large language models, private deployment"

createdAt: "Mon Apr 08 2024 14:53:59 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Wed May 01 2024 16:11:36 GMT+0000 (Coordinated Universal Time)"
---

## What is a Private Deployment?

Private deployments allow organizations to implement and run AI models within a controlled, internal environment.

In a private deployment, you manage the model deployment infrastructure (with Cohere's guidance and support). This includes ensuring hardware and driver compatibility as well as installing prerequisites to run the containers. These deployments typically run on Kubernetes, but it’s not a firm requirement.

Cohere supports two types of private deployments:

- On-premises (on-prem)
<br/>Gives you full control over both your data and the AI system on your own premises with your own hardware. You procure your own GPUs, servers and other hardware to insulate your environment from external threats.

- On the cloud, typically a virtual private cloud (VPC)
<br/>You use infrastructure needed to host AI models from a cloud provider (such as AWS, Azure, GCP, or OCI) while still retaining control of how the data is stored and processed. Cohere can support any VPC on any cloud environment, so long as the necessary hardware requirements are met.

## Why Private Deployment?

With private deployments, you maintain full control over your infrastructure while leveraging Cohere's state-of-the-art language models.

This enables you to deploy LLMs within your secure network, whether through your chosen cloud provider or your own environment. The data never leaves your environment, and the model can be fully network-isolated.

Here are some of the benefits of private deployments:

- **Data security**: On-prem deployments allow you to keep your data secure and compliant with data protection regulations. A VPC offers similar yet somewhat less rigorous protection.
- **Model customization**: Fine-tuning in a private environment allows enteprises to maintain strict control over their data, avoiding the risk of sensitive or proprietary data leaking.
- **Infrastructure needs**: Public cloud is fast and easily scalable in general. But when the necessary hardware is not available in a specific region, on-prem can provide a faster solution.

## Private Deployment Components

Cohere’s platform container consists of several key components:
- **Endpoints**: API endpoints for model interaction
- **Models**: AI model management and storage
- **Serving Framework**: Manages model serving and request handling
- **Fine-tuning Framework**: Handles model fine-tuning

Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: "Private Deployment – Setting Up"
slug: "v2/docs/private-deployment-setup"

hidden: false

description: "This page describes the setup required for private deployments of Cohere's models."
image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg"
keywords: "generative AI, large language models, private deployment"

createdAt: "Mon Apr 08 2024 14:53:59 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Wed May 01 2024 16:11:36 GMT+0000 (Coordinated Universal Time)"
---

## Getting Access
When you [sign up for private deployment](https://cohere.com/contact-sales), you will receive two key pieces of information:
1. A license key for authenticating and pulling model containers
2. A list of artifacts (docker containers) that you can pull using the license key

You can then use the license to pull and run the images, as described in the [provisioning guide](https://docs.cohere.com/docs/single-container-on-private-clouds).

## Infrastructure Requirements
Different models require different hardware requirements, depending on the model types (for example, Command, Embed, and Rerank) and their different versions.

During the engagement, you will be provided with the specific requirements, which will include:
- GPU model, count, and interconnect requirements
- System requirements
- Software and driver versions

## Available Models
Visit the [Models Reference]([[todo - link to reference page]]) page to see the models available for private deployments.

Cohere has a monthly cadence for model releases and updates.

Cohere also support model customization through fine-tuning.

<Info>[Contact sales](https://cohere.com/contact-sales) to learn more about private deployments.</Info>
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Private Deployment Usage"
slug: "v2/docs/private-deployment-usage"

hidden: false

description: "This page describes how to use Cohere's SDK to access privately deployed Cohere models."
image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg"
keywords: "generative AI, large language models, private deployment"

createdAt: "Mon Apr 08 2024 14:53:59 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Wed May 01 2024 16:11:36 GMT+0000 (Coordinated Universal Time)"
---

You can use Cohere's SDK to access privately deployed Cohere models.

## Installation

To install the Cohere SDK, choose from the following 4 languages:

<Tabs>
<Tab title="Python">

```bash
pip install -U cohere
```
[Source](https://github.com/cohere-ai/cohere-python)
</Tab>

<Tab title="TypeScript">

```bash
npm i -s cohere-ai
```
[Source](https://github.com/cohere-ai/cohere-typescript)

</Tab>

<Tab title="Java">

```gradle
implementation 'com.cohere:cohere-java:1.x.x'
```
[Source](https://github.com/cohere-ai/cohere-java)

</Tab>

<Tab title="Go">

```bash
go get github.com/cohere-ai/cohere-go/v2
```

[Source](https://github.com/cohere-ai/cohere-go)

</Tab>
</Tabs>

## Getting Started

The only difference between using Cohere's models on private deployments and the Cohere platform is how you set up the client. With private deployments, you need to pass the following parameters:
- `api_key` - Pass a blank value
- `base_url` - Pass the URL of your private deployment

```python PYTHON
import cohere

co = cohere.ClientV2(
api_key="", # Leave this blank
base_url="<YOUR_DEPLOYMENT_URL>"
)
```

To get started with example use cases, refer to the following quickstart examples:
- [Text Generation (Command model)](https://docs.cohere.com/docs/text-gen-quickstart)
- [RAG (Command model)](https://docs.cohere.com/docs/rag-quickstart)
- [Tool Use (Command model)](https://docs.cohere.com/docs/tool-use-quickstart)
- [Semantic Search (Embed model)](https://docs.cohere.com/docs/sem-search-quickstart)
- [Reranking (Rerank model)](https://docs.cohere.com/docs/reranking-quickstart)

## Integrations

You can use the LangChain library with privately deployed Cohere models. Refer to the [LangChain section](https://docs.cohere.com/docs/chat-on-langchain#using-langchain-on-private-deployments) for more information on setting up LangChain for private deployments.
Loading

0 comments on commit fb625e0

Please sign in to comment.