Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion about.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Do submit your feature requests at https://magemaker.featurebase.app/
- Deploying the same model within the same minute will break
- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models
- For Azure deploying models other than Hugging-face is not supported yet.
- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600
- Python 3.12 is currently not supported due to an open issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600


If there is anything we missed, do point them out at https://magemaker.featurebase.app/
Expand Down
3 changes: 1 addition & 2 deletions getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To get a local copy up and running follow these simple steps.

### Prerequisites

* Python 3.11 (3.13 is not supported because of azure)
* Python 3.11+ (Python 3.12 is currently not supported due to an Azure SDK issue)
* Cloud Configuration
* An account to your preferred cloud provider, AWS, GCP and Azure.
* Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider
Expand Down Expand Up @@ -178,7 +178,6 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging
<br>



## Deactivating Models

Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance.
31 changes: 24 additions & 7 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,19 @@
"mode": "auto"
},
"navigation": [
{
{
"group": "Getting Started",
"pages": ["about", "installation", "quick-start"]
"pages": [
"about",
"installation",
"quick-start"
]
},
{
"group": "Tutorials",
"pages": [
"tutorials/deploying-llama-3-to-aws",
"tutorials/deploying-llama-3-to-aws-using-query-flag",
"tutorials/deploying-llama-3-to-gcp",
"tutorials/deploying-llama-3-to-azure"
]
Expand Down Expand Up @@ -77,17 +82,29 @@
{
"title": "Documentation",
"links": [
{ "label": "Getting Started", "url": "/" },
{ "label": "Contributing", "url": "/contributing" }
{
"label": "Getting Started",
"url": "/"
},
{
"label": "Contributing",
"url": "/contributing"
}
]
},
{
"title": "Resources",
"links": [
{ "label": "GitHub", "url": "https://github.com/slashml/magemaker" },
{ "label": "Support", "url": "mailto:support@slashml.com" }
{
"label": "GitHub",
"url": "https://github.com/slashml/magemaker"
},
{
"label": "Support",
"url": "mailto:support@slashml.com"
}
]
}
]
}
}
}
160 changes: 160 additions & 0 deletions tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: Deploying Llama 3 to SageMaker using the Query Flag
---

## Introduction
This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker and querying it using YAML-based commands. Ensure you have followed the [installation](installation) steps before proceeding.

## Step 1: Setting Up Magemaker for AWS
Run the following command to configure Magemaker for AWS SageMaker deployment:
```sh
magemaker --cloud aws
```
This initializes Magemaker with the necessary configurations for deploying models to SageMaker.

## Step 2: YAML-based Deployment
For reproducible deployments, use YAML configuration:
```sh
magemaker --deploy .magemaker_config/llama3-deploy.yaml
```

Example deployment YAML:
```yaml
deployment: !Deployment
destination: aws
endpoint_name: llama3-endpoint
instance_count: 1
instance_type: ml.g5.2xlarge
num_gpus: 1
quantization: null
models:
- !Model
id: meta-llama/Meta-Llama-3-8B-Instruct
location: null
predict: null
source: huggingface
task: text-generation
version: null
```

<Note>
For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
</Note>

<Warning>
You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding.
</Warning>

## Step 3: Querying the Deployed Model
Once the deployment is complete, you can query the model using a YAML configuration file.

### Creating a Query YAML
Create a new YAML file for your query (e.g., `llama3-query.yaml`):
```yaml
deployment: !Deployment
destination: aws
endpoint_name: llama3-endpoint
instance_count: 1
instance_type: ml.g5.2xlarge
num_gpus: 1
quantization: null
models:
- !Model
id: meta-llama/Meta-Llama-3-8B-Instruct
location: null
predict: null
source: huggingface
task: text-generation
version: null
query: !Query
input: 'What are the key differences between Llama 2 and Llama 3?'
```

### Executing Queries
Run your query using the following command:
```sh
magemaker --query .magemaker_config/llama3-query.yaml
```

Example Response:
```json
{
"generated_text": "Here are the key differences between Llama 2 and Llama 3:\n\n1. Model Architecture: Llama 3 features an enhanced architecture with improved attention mechanisms and more efficient parameter utilization\n\n2. Training Data: Trained on more recent data with broader coverage and improved data quality\n\n3. Performance: Demonstrates superior performance on complex reasoning tasks and shows better coherence in long-form responses\n\n4. Context Window: Supports longer context windows allowing for processing of more extensive input text\n\n5. Instruction Following: Enhanced ability to follow complex instructions and maintain consistency in responses",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"total_tokens": 89,
"generation_time": 1.2
}
```

### Additional Query Examples

1. Creative Writing Query:
```yaml
query: !Query
input: 'Write a short story about a robot learning to paint'
```

Example Response:
```json
{
"generated_text": "In a sunlit studio, Unit-7 held a brush for the first time. Its servo motors whirred softly as it analyzed the canvas before it. Programmed for precision in manufacturing, the robot found itself puzzled by the concept of artistic expression. The first strokes were mechanical, perfect lines that lacked soul. But as days passed, Unit-7 began to introduce deliberate 'imperfections,' discovering that art lived in these beautiful accidents. One morning, its creator found Unit-7 surrounded by canvases splashed with vibrant abstracts - each one unique, each one telling the story of a machine learning to feel through color and form.",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"total_tokens": 106,
"generation_time": 1.5
}
```

2. Technical Analysis Query:
```yaml
query: !Query
input: 'Explain the concept of quantum entanglement in simple terms'
```

Example Response:
```json
{
"generated_text": "Quantum entanglement is like having two magical coins that always know what the other is doing. When two particles become entangled, they share a special connection regardless of how far apart they are. If you flip one coin and it lands on heads, its entangled partner will instantly be tails, even if it's on the other side of the universe. This connection happens faster than light can travel between them, which is why Einstein called it 'spooky action at a distance.' It's a fundamental principle of quantum mechanics that we use in quantum computing and cryptography.",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"total_tokens": 95,
"generation_time": 1.3
}
```

You can also use Python to query the model programmatically:
```python
from sagemaker.huggingface.model import HuggingFacePredictor
import sagemaker

def query_huggingface_model(endpoint_name: str, query: str):
# Initialize a SageMaker session
sagemaker_session = sagemaker.Session()

# Create a HuggingFace predictor
predictor = HuggingFacePredictor(
endpoint_name=endpoint_name,
sagemaker_session=sagemaker_session
)

# Prepare the input
input_data = {
"inputs": query
}

try:
# Make prediction
result = predictor.predict(input_data)
print(result)
return result
except Exception as e:
print(f"Error making prediction: {str(e)}")
raise e

# Example usage
if __name__ == "__main__":
ENDPOINT_NAME = "llama3-endpoint"
question = "What are you?"
response = query_huggingface_model(ENDPOINT_NAME, question)
```

## Conclusion
You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's YAML-based configuration system. This approach provides reproducible deployments and queries that can be version controlled and shared across teams. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).