diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a70cce7..f1006ba 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,11 +9,15 @@ description: How to contribute to Magemaker We love your input! We want to make contributing to Magemaker as easy and transparent as possible, whether it's: - Reporting a bug -- Discussing the current state of the code +- Discussing the current state of the code or docs - Submitting a fix - Proposing new features - Becoming a maintainer + + If you add a user-facing feature (e.g. a new CLI flag, REST route, or YAML option) **you must also update the documentation**. See “Documentation Changes” below. + + ## Ways to Contribute ### 1. Report Issues @@ -21,8 +25,8 @@ We love your input! We want to make contributing to Magemaker as easy and transp If you encounter any bugs or have feature requests: 1. Go to our GitHub repository -2. Click on "Issues" -3. Click "New Issue" +2. Click on **Issues** +3. Click **New Issue** 4. Choose the appropriate template (Bug Report or Feature Request) 5. Fill out the template with as much detail as possible @@ -33,10 +37,10 @@ If you encounter any bugs or have feature requests: ### 2. Submit Pull Requests 1. Fork the repo and create your branch from `main` -2. If you've added code that should be tested, add tests -3. If you've changed APIs, update the documentation -4. Ensure the test suite passes -5. Make sure your code lints +2. If you've added code that should be tested, add tests (`pytest`) +3. If you've changed APIs, update or create a docs page (see **Documentation Changes**) +4. Ensure the test suite passes: `pytest -q` +5. Run linters/formatters (`black . && isort . && flake8`) 6. Issue that pull request! @@ -45,24 +49,84 @@ If you encounter any bugs or have feature requests: ## Development Process -1. Fork the repo -2. Create a new branch: `git checkout -b my-feature-branch` -3. Make your changes -4. Push to your fork and submit a pull request -5. Wait for a review and address any comments - -## Pull Request Guidelines - -- Update documentation as needed -- Add tests if applicable -- Follow the existing code style -- Keep PRs small and focused -- Write clear commit messages + + + ```bash + git clone https://github.com/YOUR_USERNAME/magemaker.git + cd magemaker + ``` + + + ```bash + # main requirements + dev extras + pip install -e ".[dev]" + ``` + + + ```bash + pytest -q # run unit + integration tests + black --check . # format check + flake8 # style check + ``` + + + ```bash + npm i -g mintlify # once + mintlify dev # hot-reload docs server + ``` + + + +## Documentation Changes + +Magemaker’s docs are written in **MDX** and served by [Mintlify](https://mintlify.com/). + +1. New pages should live under an existing logical folder (e.g. `concepts/`, `tutorials/`) and be linked in `mint.json`. +2. Use the existing front-matter style: + ```mdx + --- + title: Awesome Feature + description: Short description here + --- + ``` +3. Keep tone & components consistent (Cards, Steps, Notes, etc.). +4. Run `mintlify dev` locally to preview. + + + A PR that adds a new feature without updating docs **will not be merged**. + + +### Updating the API Reference +The FastAPI server (see `server.py`) exposes REST routes such as `/chat/completions` and `/endpoint/{endpoint_name}`. If you modify or add routes, update `concepts/api.mdx` accordingly and include example requests. + +## Pull Request Checklist + +- Code builds & tests pass +- New/updated tests added when appropriate +- Docs updated (or not needed) +- Follows [Conventional Commits](https://www.conventionalcommits.org/) + +## Commit Message Convention + +We follow the Conventional Commits spec: + +- `feat:` New feature +- `fix:` Bug fix +- `docs:` Documentation only changes +- `style:` Code style updates (formatting, missing semicolons, etc.) +- `refactor:` Code change that neither fixes a bug nor adds a feature +- `test:` Adding or correcting tests +- `chore:` Other changes that don’t modify src or test files + +Example: +```bash +git commit -m "feat(api): add /v1/embeddings route" +``` ## License -By contributing, you agree that your contributions will be licensed under the Apache 2.0 License. +By contributing, you agree that your contributions will be licensed under the **Apache 2.0 License**. ## Questions? -Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! \ No newline at end of file +Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! diff --git a/README.md b/README.md index 862eb3c..c7334b6 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,82 @@ -### These are docs for the [Magemaker-Docs](https://magemaker.slashml.com) documentation site. +### Magemaker Documentation Repository -The source code of magemaker is located at [Magemaker](https://github.com/slashml/magemaker) +This repository contains the source files that power the public Magemaker docs site – [https://magemaker.slashml.com](https://magemaker.slashml.com). -### Development +If you are **updating or adding documentation** please work inside this repo, **not** the main `slashml/magemaker` code repository. -Install the [Mintlify CLI](https://www.npmjs.com/package/mintlify) to preview the documentation changes locally. To install, use the following command +--- -``` -npm i -g mintlify -``` +## Local Development + +Magemaker docs are built with **[Mintlify](https://docs.mintlify.com/)**. Use the Mintlify CLI to preview changes locally before you open a pull-request. + +1. Install the CLI (requires Node.js ≥ 16): + + ```bash + npm install -g mintlify + ``` + +2. From the root of this repository start the local dev server: + + ```bash + mintlify dev + ``` + +3. Open `http://localhost:3000` in your browser – changes hot-reload automatically. + +### Common Issues -you need to have Node.js installed to use npm. +| Symptom | Fix | +| ---------------------------------------- | ------------------------------------------------- | +| `mintlify` command not found | Run `npm i -g mintlify` | +| 404 when dev server starts | Ensure you are in the folder that contains `mint.json` | +| Build fails after adding new dependencies | Run `mintlify install` to regenerate the lock-file | -Run the following command at the root of your documentation (where mint.json is) +--- + +## Folder Structure ``` -mintlify dev +. +├── about.mdx +├── concepts/ +│ ├── api.mdx # ← NEW – API reference for FastAPI proxy +│ ├── deployment.mdx +│ └── … +├── configuration/ +│ └── … +├── tutorials/ +│ └── … +└── mint.json ``` -#### Troubleshooting +*Pages inside `concepts`, `tutorials`, and `configuration` are automatically +collected by Mintlify. When you add a new page remember to update the +`navigation` section of **mint.json** so the page appears in the sidebar.* + +--- + +## Adding / Updating Docs + +1. Create or edit an `.mdx` or `.md` file. +2. Add front-matter: + + ```yaml + --- + title: My New Page + description: Short one-sentence description + --- + ``` + +3. Preview locally (`mintlify dev`) and confirm links / images work. +4. If you introduced new functionality in `slashml/magemaker` **also add docs** for it here (see the newly-added `concepts/api.mdx` for an example). +5. Commit using [Conventional Commits](https://www.conventionalcommits.org/) conventions, open a PR, and request review. + +--- + +## Reference Links -- Mintlify dev isn't running - Run `mintlify install` it'll re-install dependencies. -- Page loads as a 404 - Make sure you are running in a folder with `mint.json` \ No newline at end of file +* Main codebase – +* Live documentation – +* Contribution guidelines – [`CONTRIBUTING.md`](./CONTRIBUTING.md) +* **NEW** – FastAPI / OpenAI-style proxy reference – [`concepts/api.mdx`](./concepts/api.mdx) diff --git a/about.mdx b/about.mdx index d9c04a4..4d79022 100644 --- a/about.mdx +++ b/about.mdx @@ -1,43 +1,64 @@ --- title: About -description: Deploy open source AI models to AWS, GCP, and Azure in minutes +description: Deploy open-source AI models to AWS, GCP, and Azure in minutes "og:title": "Magemaker" --- ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying open source AI models to your preferred cloud provider. Instead of spending hours digging through documentation, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning. +Magemaker is a Python tool that makes it **push-button simple** to deploy open-source AI models to your preferred cloud provider. Instead of spending hours digging through cloud docs, you can spin up Hugging Face models directly on: -## What we're working on next +- **AWS SageMaker** +- **Google Cloud Vertex AI** +- **Azure Machine Learning** -- More robust error handling for various edge cases -- Verbose logging -- Enabling / disabling autoscaling -- Enhanced multi-cloud support features +### Key Features -Do submit your feature requests at https://magemaker.featurebase.app/ + + + Guided, menu-driven deployments – perfect for first-time users. + + + Declarative configs for repeatable, CI-friendly deployments. + + + Train and tune models with a single YAML file. + + + One tool, three clouds. Mix-and-match providers as needed. + + + Serve any deployed model behind an `/chat/completions` endpoint that works with the official *openai* Python SDK. + + -## Known issues +### What We're Working on Next -- Querying within Magemaker currently only works with text-based models -- Deleting a model is not instant, it may show up briefly after deletion -- Deploying the same model within the same minute will break -- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models -- For Azure deploying models other than Hugging-face is not supported yet. -- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600 +- More robust error handling for edge cases +- Verbose / structured logging +- On-demand autoscaling controls +- Additional multi-cloud utilities +- Expanded model-type support (vision, audio, multimodal) +Have a feature request? Let us know at [magemaker.featurebase.app](https://magemaker.featurebase.app/). -If there is anything we missed, do point them out at https://magemaker.featurebase.app/ +### Known Issues +1. Querying currently supports **text-based** models only. +2. Endpoint deletion is asynchronous – an endpoint may appear for several minutes after deletion is requested. +3. Deploying the **same endpoint name** within the same minute can fail (name collision). +4. Hugging Face model **IDs differ on Azure** – follow the [Quick Start](/quick-start) guide to obtain the correct ID. +5. Azure supports **Hugging Face models only** at the moment. +6. Python 3.13 is not yet supported due to an [open Azure SDK issue](https://github.com/Azure/azure-sdk-for-python/issues/37600). -## License +If we missed something, please report it on [Featurebase](https://magemaker.featurebase.app/). -Distributed under the Apache 2.0 License. See `LICENSE` for more information. +### License -## Contact +Distributed under the Apache 2.0 License. See `LICENSE` for more information. -You can reach us, faizan & jneid, at [faizan|jneid@slashml.com](mailto:support@slashml.com). +### Contact -You can give feedback at https://magemaker.featurebase.app/ +Questions or feedback? Reach out to **Faizan & Jneid** at [support@slashml.com](mailto:support@slashml.com). -We'd love to hear from you! We're excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. +We’d love to hear from you – community feedback drives our roadmap! diff --git a/concepts/api-reference.mdx b/concepts/api-reference.mdx new file mode 100644 index 0000000..5c88a40 --- /dev/null +++ b/concepts/api-reference.mdx @@ -0,0 +1,148 @@ +--- +title: Inference API Reference +description: REST & OpenAI-compatible endpoints for deployed Magemaker models +--- + +## Overview + +Magemaker ships with a lightweight **FastAPI** server (`server.py`) that exposes every deployed model through a simple REST interface. It also implements the `/chat/completions` route so you can talk to your private model **exactly** like you would with OpenAI. + +Run locally: + +```bash +python server.py # default: http://localhost:8000 +``` + +The server automatically picks up your AWS region via the Magemaker session and requires no extra configuration files. + +--- + +## Endpoints + +### `GET /endpoint/{endpoint_name}` – Inspect Endpoint + +Returns metadata for a SageMaker endpoint (status, instance type, creation time, etc.). + +```bash +curl http://localhost:8000/endpoint/my-bert-endpoint +``` + +Response (trimmed): +```json +{ + "EndpointName": "my-bert-endpoint", + "EndpointStatus": "InService", + "ProductionVariants": [ + { "VariantName": "AllTraffic", "InstanceType": "ml.m5.xlarge" } + ] +} +``` + +--- + +### `POST /endpoint/{endpoint_name}/query` – Single-Shot Inference + +Send an arbitrary query to a specific endpoint. + +Request body (`application/json`): +```json +{ + "inputs": "What is Magemaker?", + "parameters": { + "max_new_tokens": 100, + "temperature": 0.7 + } +} +``` + +Python example: +```python +import requests, json + +payload = { + "inputs": "What is Magemaker?", + "parameters": {"max_new_tokens": 100} +} +resp = requests.post( + "http://localhost:8000/endpoint/my-bert-endpoint/query", + json=payload, +) +print(resp.json()) +``` + +--- + +### `POST /chat/completions` – OpenAI-Compatible Chat + +Proxy that lets you use the **openai** SDK with your own model. + +Request body (`application/json`): +```json +{ + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "messages": [ + { "role": "user", "content": "Tell me a dad joke" } + ] +} +``` + +Example with the official SDK: +```python +import openai +openai.api_base = "http://localhost:8000" # Point to Magemaker +openai.api_key = "sk-ignored" # Not used, but required by the SDK + +response = openai.ChatCompletion.create( + model="meta-llama/Meta-Llama-3-8B-Instruct", + messages=[{"role": "user", "content": "Tell me a dad joke"}], +) +print(response.choices[0].message.content) +``` + +Behind the scenes the server: +1. Locates an endpoint that hosts the requested model (`get_endpoints_for_model`). +2. Translates the chat payload into a standard Magemaker `Query` object. +3. Performs the prediction via `sagemaker-runtime`. +4. Streams back an OpenAI-compatible JSON response. + +If the model is **not deployed**, the server raises `NotDeployedException` → HTTP `404`. + +--- + +## Environment Variables + +The server relies on the same `.env` file generated by `magemaker --cloud ...`. No additional variables are required, but you can override AWS region for debugging: + +```bash +export AWS_REGION_NAME=us-west-2 +python server.py +``` + +--- + +## Multi-Model Endpoints + +When an endpoint hosts multiple models, the query route automatically uses the first model in the deployment config. Custom routing logic can be added by editing `server.py`. + +--- + +## Error Handling + +- `404 Not Found` – Endpoint or model not deployed. +- `500 Internal Server Error` – Downstream SageMaker error (check CloudWatch logs). + +--- + +## Production Tips + +1. Deploy the FastAPI server behind an internal load-balancer or an API Gateway. +2. Add a reverse proxy (e.g., Nginx) for TLS termination. +3. Use AWS STS or IAM roles for fine-grained access control. +4. Enable logging & monitoring with Prometheus / Grafana. + +--- + +### Next Steps + +- Try the [Quick Start](/quick-start) guide to deploy your first model. +- Read the [Deployment concept](/concepts/deployment) for YAML examples. diff --git a/concepts/api.mdx b/concepts/api.mdx new file mode 100644 index 0000000..527b5ba --- /dev/null +++ b/concepts/api.mdx @@ -0,0 +1,132 @@ +--- +title: REST & OpenAI-Compatible API +description: Programmatically query your deployed models via REST or the OpenAI Chat Completion API +--- + +Magemaker ships with a lightweight FastAPI server (`server.py`) that lets you: + +1. Inspect deployed SageMaker endpoints +2. Send structured `Query` objects to an endpoint +3. Use an **OpenAI-compatible `/chat/completions`** route powered by [LiteLLM](https://github.com/BerriAI/litellm) + +This page documents the available routes, expected request/response formats, and sample usage. + +## Running the Server + +```bash +python server.py # defaults to 0.0.0.0:8000 +``` + +The server automatically reads AWS region information from the active Magemaker session and expects a populated `.env` with your cloud credentials (see [Environment Variables](/configuration/Environment)). + + + The server currently supports **AWS SageMaker** back-ends. GCP & Azure support are on the roadmap. + + +## Routes + +### 1. `GET /endpoint/{endpoint_name}` +Returns the raw SageMaker endpoint description. + +```bash +curl http://localhost:8000/endpoint/my-bert-endpoint | jq +``` + +Response (truncated): +```json +{ + "EndpointName": "my-bert-endpoint", + "EndpointConfigName": "my-bert-endpoint-config", + "ProductionVariants": [...], + "CreationTime": "2024-05-30T16:22:11Z", + ... +} +``` + +### 2. `POST /endpoint/{endpoint_name}/query` +Send a structured `Query` (same schema used by the CLI) to the endpoint. + +Request body example: +```json +{ + "inputs": "What are LLMs?", + "parameters": { + "max_new_tokens": 100, + "temperature": 0.7 + } +} +``` + +Successful response mirrors the Hugging Face inference API result: +```json +{ + "generated_text": "Large Language Models (LLMs) are …" +} +``` + +### 3. `POST /chat/completions` +OpenAI-style chat completions powered by LiteLLM. This allows you to drop-in replace the OpenAI SDK in existing apps. + +Request body example: +```json +{ + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "messages": [ + {"role": "user", "content": "Explain RLHF in one paragraph."} + ] +} +``` + +Under the hood the server: +1. Looks up SageMaker endpoints that serve `model`. +2. Proxies the chat request through LiteLLM using the first matching endpoint (`sagemaker/{endpoint_name}`). + +Response matches the OpenAI schema: +```json +{ + "id": "chatcmpl-abc123", + "object": "chat.completion", + "created": 1717086490, + "model": "sagemaker/llama3-endpoint", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "Reinforcement Learning from Human Feedback (RLHF) …"}, + "finish_reason": "stop" + } + ], + "usage": {"prompt_tokens": 15, "completion_tokens": 85, "total_tokens": 100} +} +``` + + + If no SageMaker endpoint is found for the requested `model`, the server raises `NotDeployedException` (HTTP 500). + + +## Authentication & Environment Variables + +No additional auth headers are required; the server uses the credentials already configured via `magemaker --cloud aws` and stored in `.env`. + +```bash +AWS_ACCESS_KEY_ID=... +AWS_SECRET_ACCESS_KEY=... +SAGEMAKER_ROLE=... +``` + +## Example Python Client + +```python +import requests + +url = "http://localhost:8000/chat/completions" +req = { + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "messages": [{"role": "user", "content": "Tell me a dad joke."}] +} +print(requests.post(url, json=req).json()) +``` + +## Roadmap +- [ ] Support GCP & Azure endpoints +- [ ] Token-based authentication / rate limiting +- [ ] Streaming responses (`stream=true`) diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx index 8c61908..da87d28 100644 --- a/concepts/contributing.mdx +++ b/concepts/contributing.mdx @@ -3,166 +3,160 @@ title: Contributing description: Guide to contributing to Magemaker --- -## Welcome to Magemaker Contributing Guide +## Welcome to the Magemaker Contributing Guide -We're excited that you're interested in contributing to Magemaker! This document will guide you through the process of contributing to the project. +We're thrilled that you're interested in helping make Magemaker better! This guide explains our preferred workflow, quality standards, and the resources available to contributors. ## Ways to Contribute - Create issues for bugs you encounter while using Magemaker + Found a bug? Open an issue with reproduction steps and expected vs. actual behaviour. - Suggest new features or improvements + Have an idea? Create a feature-request issue so we can discuss it. - Help improve our documentation + Improve or add docs—for example, the new OpenAI-compatible API reference. - Submit pull requests with bug fixes or new features + Fix bugs, add tests, or build new functionality via pull requests. ## Development Setup - - ```bash - git clone https://github.com/YOUR_USERNAME/magemaker.git - cd magemaker - ``` + + ```bash + git clone https://github.com/YOUR_USERNAME/magemaker.git + cd magemaker + ``` - - ```bash - pip install -e ".[dev]" - ``` + + + ```bash + pip install -e ".[dev]" + ``` + + + + ```bash + npm i -g mintlify # one-time install + mintlify dev # from repo root + ``` - - ```bash - git checkout -b feature/your-feature-name - ``` + + + ```bash + git checkout -b feat/your-feature-name + ``` ## Development Guidelines -### Code Style +### Code Style & Linting + +We enforce: -We use the following tools to maintain code quality: -- Black for Python code formatting -- isort for import sorting -- flake8 for style guide enforcement +* **Black** – formatting +* **isort** – import order +* **flake8** – linting +* **pytest** – testing -Run the following before committing: ```bash black . isort . flake8 +pytest -q # run tests before every commit ``` -### Testing +### Tests -All new features should include tests. We use pytest for our test suite. +All new functionality **must** include unit and/or integration tests. Use `pytest` and place tests under the closest `test_*.py` file in the appropriate module. -Run tests locally: -```bash -pytest tests/ -``` - ### Documentation -When adding new features, please update the relevant documentation: - -1. Update the README.md if needed -2. Add/update docstrings for new functions/classes -3. Create/update relevant .mdx files in the docs directory +1. Update existing markdown / MDX docs to reflect code changes. +2. If you add a new surface area (e.g., a new CLI flag or FastAPI route), create a **new page** under the appropriate section. See the new `/concepts/api-reference.mdx` page for an example. +3. Verify navigation order in `mint.json` if you add a top-level page. ## Pull Request Process - - Create a new branch for your changes: - ```bash - git checkout -b feature/your-feature - ``` + + Use [Conventional Commits](https://www.conventionalcommits.org/) for commit messages. + ```bash + git add . + git commit -m "feat(api): add /v1/embeddings endpoint" + ``` - - Make your changes and commit them with clear commit messages: - ```bash - git add . - git commit -m "feat: add new deployment option" - ``` + + ```bash + git push origin feat/api-embeddings + ``` + Then open a PR against `slashml/magemaker:main`. - - Push your changes to your fork: - ```bash - git push origin feature/your-feature - ``` - - - Open a Pull Request against the main repository + + Our GitHub Actions run linting, tests, and build the docs. Fix any failures. -### Pull Request Guidelines +### Pull Request Checklist - Provide a clear description of your changes + Explain *what* and *why*. - Include relevant tests for new features + Cover new code paths. - Update documentation as needed + Update or add docs (including API reference when relevant). - - Keep commits focused and clean + + Keep commit history focused. ## Commit Message Convention -We follow the [Conventional Commits](https://www.conventionalcommits.org/) specification: +We follow **Conventional Commits**: - `feat:` New feature - `fix:` Bug fix -- `docs:` Documentation changes -- `style:` Code style changes +- `docs:` Documentation only changes +- `style:` Non-functional style changes - `refactor:` Code refactoring -- `test:` Adding missing tests -- `chore:` Maintenance tasks +- `test:` Adding or updating tests +- `chore:` Build or tooling updates Example: ```bash -feat(deployment): add support for custom docker images +feat(deployment): add support for custom Docker images ``` ## Getting Help -If you need help with your contribution: - - - Join our Discord server for real-time discussions + + Real-time chat with maintainers and community. - - Start a discussion in our GitHub repository + Long-form Q&A threads. - - - Contact us at support@slashml.com + + support@slashml.com ## Code of Conduct -We are committed to providing a welcoming and inclusive experience for everyone. Please read our [Code of Conduct](https://github.com/slashml/magemaker/CODE_OF_CONDUCT.md) before participating. +We are committed to a welcoming community. Please read our [Code of Conduct](https://github.com/slashml/magemaker/CODE_OF_CONDUCT.md) before participating. ## License -By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. \ No newline at end of file +By contributing, you agree your work is available under the Apache 2.0 License. diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx index 66ca7a9..043477a 100644 --- a/concepts/deployment.mdx +++ b/concepts/deployment.mdx @@ -36,6 +36,30 @@ This is recommended for: - Infrastructure as Code (IaC) - Team collaborations +### Programmatic Access (OpenAI-Compatible Proxy) + +After deployment you can _optionally_ spin up the lightweight FastAPI server included in Magemaker (`server.py`). +The server exposes **OpenAI-compatible** routes so that any tool that speaks the OpenAI API can talk to your private SageMaker / Vertex / Azure endpoint. + +Start the server: + +```bash +python -m magemaker.server # or simply: python server.py +``` + +Key routes: + +| Verb | Route | Description | +|------|-------|-------------| +| GET | `/endpoint/{endpoint_name}` | Returns endpoint metadata | +| POST | `/endpoint/{endpoint_name}/query` | Query an existing endpoint with the request body defined by `Query` | +| POST | `/chat/completions` | OpenAI-style chat completion that automatically selects an endpoint for the requested model | + +Full schema details and code samples are available in the new [API Reference](/concepts/api-reference) page. + +> **Tip** +> Combine the proxy with tools like LangChain or the OpenAI Python SDK to minimise code changes when moving from OpenAI to your own infrastructure. + ## Multi-Cloud Deployment Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here's how to deploy the same model (facebook/opt-125m) to different cloud providers: @@ -202,10 +226,9 @@ Choose your instance type based on your model's requirements: 4. Set up monitoring and alerting for your endpoints -Make sure you setup budget monitory and alerts to avoid unexpected charges. +Make sure you set up budget monitoring and alerts to avoid unexpected charges. - ## Troubleshooting Deployments Common issues and their solutions: @@ -225,4 +248,4 @@ Common issues and their solutions: - Verify model ID and version - Check instance memory requirements - Validate Hugging Face token if required - - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue. + - Endpoint deployed but deployment failed. Check the logs, and do report this to us if you see this issue. diff --git a/concepts/models.mdx b/concepts/models.mdx index 0161380..e31b3db 100644 --- a/concepts/models.mdx +++ b/concepts/models.mdx @@ -6,7 +6,7 @@ description: Guide to supported models and their requirements ## Supported Models -Currently, Magemaker supports deployment of Hugging Face models only. Support for cloud provider marketplace models is coming soon! +Magemaker currently supports deploying **Hugging Face models** _and_ **AWS SageMaker JumpStart models** out-of-the-box. Support for additional cloud-specific marketplaces (Vertex AI Model Garden, Azure Model Catalog, etc.) is on the roadmap. ### Hugging Face Models @@ -26,15 +26,24 @@ Currently, Magemaker supports deployment of Hugging Face models only. Support fo +### AWS SageMaker JumpStart Models + + + + - Text generation & chat (e.g. Falcon, Mistral) + - Embeddings & feature extraction + - Vision and audio models + - Built-in algorithms (XGBoost, Image Classification, etc.) + + + +You can search and select JumpStart models from the interactive CLI (`Search SageMaker JumpStart Models` option) **or** provide the model ID in a YAML file with `source: sagemaker`. + ### Future Support We plan to add support for the following model sources: - - Models from AWS Marketplace and SageMaker built-in algorithms - - Models from Vertex AI Model Garden and Foundation Models @@ -43,51 +52,52 @@ We plan to add support for the following model sources: Models from Azure ML Model Catalog and Azure OpenAI + ## Model Requirements ### Instance Type Recommendations by Cloud Provider #### AWS SageMaker -1. **Small Models** (ml.m5.xlarge) +1. **Small Models** (`ml.m5.xlarge`) ```yaml instance_type: ml.m5.xlarge ``` -2. **Medium Models** (ml.g4dn.xlarge) +2. **Medium Models** (`ml.g4dn.xlarge`) ```yaml instance_type: ml.g4dn.xlarge ``` -3. **Large Models** (ml.g5.12xlarge) +3. **Large Models / GPT-class** (`ml.g5.12xlarge`) ```yaml instance_type: ml.g5.12xlarge num_gpus: 4 ``` #### GCP Vertex AI -1. **Small Models** (n1-standard-4) +1. **Small Models** (`n1-standard-4`) ```yaml machine_type: n1-standard-4 ``` -2. **Medium Models** (n1-standard-8 + GPU) +2. **Medium Models** (`n1-standard-8` + T4 GPU) ```yaml machine_type: n1-standard-8 accelerator_type: NVIDIA_TESLA_T4 accelerator_count: 1 ``` -3. **Large Models** (a2-highgpu-1g) +3. **Large Models** (`a2-highgpu-1g`) ```yaml machine_type: a2-highgpu-1g ``` #### Azure ML -1. **Small Models** (Standard_DS3_v2) +1. **Small Models** (`Standard_DS3_v2`) ```yaml instance_type: Standard_DS3_v2 ``` -2. **Medium Models** (Standard_NC6s_v3) +2. **Medium Models** (`Standard_NC6s_v3`) ```yaml instance_type: Standard_NC6s_v3 ``` -3. **Large Models** (Standard_ND40rs_v2) +3. **Large Models** (`Standard_ND40rs_v2`) ```yaml instance_type: Standard_ND40rs_v2 ``` @@ -129,23 +139,22 @@ deployment: !Deployment ``` - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. - - To find the relevnt model id, follow the following steps - - - Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar - ![Azure ML Creation](../Images/workspace-studio.png) - + The model IDs for Azure differ from AWS and GCP. Use the ID shown in the Azure ML Model Catalog. See [Quick Start → Azure](/quick-start#azure-ml) for detailed steps. + - - Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file - ![Azure ML Creation](../Images/hugging-face.png) - +### Example SageMaker JumpStart Deployment - - +```yaml +models: +- !Model + id: huggingface-textgeneration-gpt2-xl + source: sagemaker # note the source is now sagemaker +deployment: !Deployment + destination: aws + instance_type: ml.g5.12xlarge + endpoint_name: gpt2-xl-jumpstart +``` ## Model Configuration @@ -155,8 +164,8 @@ deployment: !Deployment models: - !Model id: your-model-id - source: huggingface|sagemaker # we don't support vertex and azure specific models yet - revision: latest # Optional: specify model version + source: huggingface | sagemaker + revision: latest # Optional: specify model version or commit hash ``` ### Advanced Parameters @@ -180,10 +189,9 @@ models: - Compare pricing across cloud providers - Consider data residency requirements - Test latency from different regions - -3. **Cost Management** +2. **Cost Management** - Compare instance pricing - - Make sure you set up the relevant alerting + - Set up budget alerts and auto-shutdown policies ## Troubleshooting @@ -202,4 +210,4 @@ Common model-related issues: 3. **Authentication Issues** - Verify cloud credentials - Check model access permissions - - Validate API keys \ No newline at end of file + - Validate API keys diff --git a/concepts/openai-proxy.mdx b/concepts/openai-proxy.mdx new file mode 100644 index 0000000..4d23e9b --- /dev/null +++ b/concepts/openai-proxy.mdx @@ -0,0 +1,117 @@ +--- +title: OpenAI-Compatible Proxy +description: Use Magemaker endpoints with the OpenAI SDK via the built-in FastAPI server. +--- + +## Introduction +Magemaker ships with a lightweight FastAPI server (`server.py`) that allows you to interact with your **SageMaker** endpoints through an **OpenAI-compatible** REST interface. This means you can replace: + +```python +import openai +openai.api_key = "sk-…" +openai.ChatCompletion.create( + model="gpt-4o", + messages=[{"role": "user", "content": "Hello!"}] +) +``` + +with **zero code changes** other than pointing the SDK to your local Magemaker proxy. + +--- +## Starting the Server + +```bash +# Activate your virtual-env first +uvicorn server:app --host 0.0.0.0 --port 8000 +``` + +Alternatively, run the file directly: + +```bash +python server.py # binds to 0.0.0.0:8000 by default +``` + +> The server automatically infers `AWS_REGION_NAME` from your `.env` or the active boto3 session. + +--- +## Available Routes + +| Method | Path | Description | +| ------ | ---- | ----------- | +| GET | `/endpoint/{name}` | Retrieve metadata for a SageMaker endpoint | +| POST | `/endpoint/{name}/query` | Invoke the endpoint with a Magemaker `Query` payload | +| POST | `/chat/completions` | OpenAI-style chat completion using the first matching endpoint | + +### `/chat/completions` Schema +Request: +```jsonc +{ + "model": "meta-llama/Meta-Llama-3-8B-Instruct", // HF model id + "messages": [ + {"role": "user", "content": "Tell me a joke."} + ] +} +``` + +Response (truncated): +```jsonc +{ + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Why did the…" + }, + "finish_reason": "stop" + } + ], + "created": 1717690370, + "model": "sagemaker/llama3-endpoint", + "object": "chat.completion" +} +``` + +--- +## SDK Configuration + +```python +import openai + +openai.api_key = "dummy" # Any non-empty string works +openai.base_url = "http://localhost:8000" # Point to the proxy + +response = openai.ChatCompletion.create( + model="meta-llama/Meta-Llama-3-8B-Instruct", + messages=[{"role": "user", "content": "Hello!"}], +) +print(response.choices[0].message.content) +``` + + +The proxy dynamically selects the **first** SageMaker endpoint that hosts the requested `model`. Deploying multiple versions of the same model will always use the earliest endpoint in the list. + + +--- +## Environment Variables +See the [Environment Variables](/configuration/Environment) reference for a complete list. The proxy specifically relies on: + +- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION_NAME` +- **Optional** `OPENAI_API_KEY` (dummy value) and `OPENAI_BASE_URL` if you want to avoid passing them programmatically. + +--- +## Production Tips + +1. **Reverse Proxy** – Place Nginx or an ALB in front of the FastAPI server for TLS termination. +2. **Auth** – The reference implementation has **no authentication**. Add JWT or an API-key middleware before running in production. +3. **Scaling** – Run multiple Uvicorn workers behind Gunicorn to handle concurrent requests. +4. **Timeouts** – Large language models can take >30s to respond; adjust client/server timeouts accordingly. + +--- +## Troubleshooting + +| Symptom | Possible Cause | +| ------- | -------------- | +| `NotDeployedException` | No SageMaker endpoint was found for the requested model id. Deploy the model first. | +| 404 on `/v1/chat/completions` | You are pointing the OpenAI SDK to `/v1/...`. Use `/chat/completions` or configure `api_path=""`. | +| CORS errors in browser | Add `CORSMiddleware` to `server.py` or terminate through a proxy that injects CORS headers. | diff --git a/configuration/AWS-IAM-Role.mdx b/configuration/AWS-IAM-Role.mdx new file mode 100644 index 0000000..967ca00 --- /dev/null +++ b/configuration/AWS-IAM-Role.mdx @@ -0,0 +1,49 @@ +--- +title: AWS IAM Role Details +--- + +# SageMaker Execution Role Explained + +This page dives deeper into the **SageMakerRole** that Magemaker automatically creates via `scripts/setup_role.sh`. + +## Why do we need a dedicated role? +SageMaker endpoints assume an IAM role at **runtime** to: +1. Pull model artifacts from S3. +2. Create/attach network interfaces. +3. Write logs to CloudWatch. + +## Role Creation Logic +```bash +# executed by magemaker --cloud aws +env ROLE_NAME=SageMakerRole +aws iam create-role \ + --role-name "$ROLE_NAME" \ + --assume-role-policy-document file://assume-role.json +aws iam attach-role-policy \ + --role-name "$ROLE_NAME" \ + --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess +``` + +The ARN is then exported to `.env` under `SAGEMAKER_ROLE` so that all subsequent CLI calls (deploy, query, delete) know which role to pass to the SageMaker SDK. + +## Customising the Role +If you already have a locked-down organisation you can: +1. **Rename** the role—set `ROLE_NAME=` before executing the script. +2. **Restrict** permissions—start from `AmazonSageMakerFullAccess`, then remove services you don’t use (e.g. Training, Pipelines). + +Update your `.env` accordingly. + +```bash +SAGEMAKER_ROLE="arn:aws:iam:::role/YourCustomRole" +``` + +## Verification +Run the quick check below after creating or modifying the role: + +```bash +aws iam simulate-principal-policy \ + --policy-source-arn "$SAGEMAKER_ROLE" \ + --action-names sagemaker:CreateEndpoint +``` + +`EvaluationDecision": "allowed"` means you’re good to go. diff --git a/configuration/AWS.mdx b/configuration/AWS.mdx index cdc4b9f..33e596d 100644 --- a/configuration/AWS.mdx +++ b/configuration/AWS.mdx @@ -2,81 +2,134 @@ title: AWS --- -### AWS CLI +## Overview +This guide walks you through the exact AWS prerequisites Magemaker needs **before** you can deploy a model to SageMaker. -To install Azure SDK on MacOS, you need to have the latest OS and you need to use Rosetta terminal. Also, make sure you have the latest version of Xcode tools installed. + +Running `magemaker --cloud aws` will set **everything** up for you automatically (create an IAM role, write a `.env` file, verify the AWS CLI, etc.). +You only need to perform the manual steps below if you want to understand or replicate what the CLI does under the hood. + -Follow this guide to install the latest AWS CLI +--- +### 1 – Install / Verify AWS CLI -https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html +• macOS & Apple Silicon: install via Homebrew and make sure you are **using a Rosetta terminal** (`arch` should print `i386`). +• Windows / Linux: follow the official docs. +```bash +# macOS +brew update && brew install awscli +# verify +aws --version +``` -Once you have the CLI installed and working, follow these steps +Full install instructions: +https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html +--- +### 2 – Create an AWS Account (if you don’t have one) -### AWS Account +1. Go to [aws.amazon.com](https://aws.amazon.com/) and sign up. +2. Log in to the [AWS Console](https://console.aws.amazon.com/). - - -Register for an [AWS account](https://aws.amazon.com/) and sign-in to the [console](https://console.aws.amazon.com/). - +--- +### 3 – Generate an IAM User & Access Keys + -From the console, use the Search bar to find and select IAM (***do not use IAM Identity Center***, which is confusingly similar but a totally different system). - -![Enter image alt description](../Images/muJ_Image_1.png) - -You should see the following screen after clicking IAM. - -![Enter image alt description](../Images/ldC_Image_2.png) +Search for **IAM** in the console (⚠️ *do **NOT** select “IAM Identity Center”*). -1. Select `Users` in the side panel - -![Enter image alt description](../Images/QX4_Image_3.png) +1. Click **Users** in the sidebar. +2. Create a new user (or pick an existing one). + -2. Create a user if you don't already have one + +Attach **all three** policies: +- `AmazonSageMakerFullAccess` +- `IAMFullAccess` +- `ServiceQuotasFullAccess` + -![Enter image alt description](../Images/ly3_Image_4.png) + +1. In the user details, open **Security credentials**. +2. Under **Access keys**, click **Create access key** → **Command Line Interface**. + - -1. Click on "Add permissions" - -![Enter image alt description](../Images/E7x_Image_5.png) +Store the *Access key ID* **and** *Secret access key* in a safe place—you will paste them into the Magemaker prompt shortly. -2. Select "Attach policies directly". Under permission policies, search for and tick the boxes for: - - `AmazonSagemakerFullAccess` - - `IAMFullAccess` - - `ServiceQuotasFullAccess` +--- +### 4 – (Optional) Understand the Automatic Role Setup + +When you run `magemaker --cloud aws` the CLI executes +`magemaker/scripts/setup_role.sh` which: + +1. Creates an IAM role called **SageMakerRole** with the following trust policy: + ```json + { + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": { + "Service": "sagemaker.amazonaws.com" + }, + "Action": "sts:AssumeRole" + }] + } + ``` +2. Attaches the managed policy `AmazonSageMakerFullAccess`. +3. Writes the resulting role ARN to `.env` as `SAGEMAKER_ROLE`. + +If you prefer to run the script yourself: + +```bash +bash magemaker/scripts/setup_role.sh +``` + + +The script requires a configured AWS CLI profile with **admin-level** permissions, otherwise role creation will fail. + -Then click Next. +--- +### 5 – Configure Magemaker -![Enter image alt description](../Images/01X_Image_6.png) +Run the interactive setup: -The final list should look like the following: +```bash +magemaker --cloud aws +``` -![Enter image alt description](../Images/Dfp_Image_7.png) +You will be prompted for: +1. **AWS Access key ID** +2. **AWS Secret access key** +3. **AWS Region** (default `us-east-1`) +4. (Optional) **Hugging Face token** for gated models (e.g. Llama 3) -Click "Create user" on the following screen. - +Magemaker will create/append to a `.env` file with the following variables: - -1. Click the name of the user you've just created (or one that already exists) -2. Go to "Security Credentials" tab -3. Scroll down to "Access Keys" section -4. Click "Create access key" -5. Select Command Line Interface then click next - -![Enter image alt description](../Images/BPP_Image_8.png) +```bash +AWS_ACCESS_KEY_ID="..." +AWS_SECRET_ACCESS_KEY="..." +SAGEMAKER_ROLE="arn:aws:iam:::role/SageMakerRole" +AWS_DEFAULT_REGION="us-east-1" +HUGGING_FACE_HUB_KEY="..." # optional +``` -Enter a description (this is optional, can leave blank). Then click next. +Never commit your `.env` file to version control! -![Enter image alt description](../Images/gMD_Image_9.png) +--- +### 6 – Verify Quotas +Even with the correct IAM role you might hit **instance-quota** limits when deploying large models. -**Store BOTH the Access Key and the Secret access key for the next step. Once you've saved both keys, click Done.** +1. Open the AWS Console → **Service Quotas**. +2. Filter for **Amazon SageMaker**. +3. Check the *Instance* rows (e.g. `ml.g5.12xlarge`). +4. Request an increase if necessary. -![Enter image alt description](../Images/Gjw_Image_10.png) - - \ No newline at end of file +--- +### 7 – Next Steps +• Proceed to the [Quick Start](/quick-start) guide to deploy your first model. +• Need fine-tuning? Read the [Fine-tuning](/concepts/fine-tuning) docs. +• Prefer IaC? Use YAML deployments as shown in the [Deployment](/concepts/deployment) section. diff --git a/configuration/Azure.mdx b/configuration/Azure.mdx index 3c1104a..19ed91e 100644 --- a/configuration/Azure.mdx +++ b/configuration/Azure.mdx @@ -5,80 +5,134 @@ description: Configure Magemaker for your cloud providers ### Azure CLI -To install Azure SDK on MacOS, you need to have the latest OS and you need to use Rosetta terminal. Also, make sure you have the latest version of Xcode tools installed. +To install Azure SDK on macOS you need the latest OS **and** you must use a Rosetta terminal on Apple-silicon devices. Verify you are in a Rosetta shell by running `arch` – it should output `i386`. - -To install the latest Azure CLI, run: +Install the Azure CLI with Homebrew: ```bash brew update && brew install azure-cli ``` -Alternatively, follow this official guide from Azure -- [https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos) - -Once you have installed azure CLI, follow these steps - +Alternatively follow the official guide: +- https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos -### Azure Account -Step 1: Create azure cloud account +Once the CLI is installed, continue with the steps below. -- [https://azure.microsoft.com/en-ca](null) +--- +## Azure Account & Workspace Setup + Authenticate with your Azure account: ```bash az login ``` + + If you have multiple subscriptions, choose the one you want Magemaker to use: ```bash az account set --subscription ``` - - From the terminal + + From the terminal: ```bash az group create --name --location ``` - From the Azure Portal - ![Enter image alt description](../Images/XzN_Image_12.png) - + Or through the Azure Portal: + ![Create Resource Group](../Images/XzN_Image_12.png) - - From the terminal + + + From the terminal: ```bash az ml workspace create -n -g ``` -From the Azure portal -1. Search for `Azure Machine Learning` in the search bar. - ![Azure ML Creation](../Images/AzureML.png) - -2. Inside the `Azure Machine Learning` portal. Click on Create, and select `New Workspce` from the drop down - ![workspace creation](../Images/workspace_creation.png) - + From the Azure Portal: + 1. Search for **Azure Machine Learning** in the search bar. + ![Azure ML Creation](../Images/AzureML.png) + 2. Click **Create ➜ New Workspace**. + ![Workspace creation](../Images/workspace_creation.png) - - ```bash - # Register all required providers: THIS STEP IS IMPORTANT - az provider register --namespace Microsoft.MachineLearningServices - az provider register --namespace Microsoft.ContainerRegistry - az provider register --namespace Microsoft.KeyVault - az provider register --namespace Microsoft.Storage - az provider register --namespace Microsoft.Insights - az provider register --namespace Microsoft.ContainerService - az provider register --namespace Microsoft.PolicyInsights - az provider register --namespace Microsoft.Cdn - ``` - - - Registration can take up to 10 minutes. Check status with: ```bash az - provider show -n Microsoft.MachineLearningServices ``` - + + **This step is mandatory – deployments will fail if providers are not registered.** + ```bash + az provider register --namespace Microsoft.MachineLearningServices + az provider register --namespace Microsoft.ContainerRegistry + az provider register --namespace Microsoft.KeyVault + az provider register --namespace Microsoft.Storage + az provider register --namespace Microsoft.Insights + az provider register --namespace Microsoft.ContainerService + az provider register --namespace Microsoft.PolicyInsights + az provider register --namespace Microsoft.Cdn + ``` + + Registration may take up to 10 minutes. Check status with: + ```bash + az provider show -n Microsoft.MachineLearningServices + ``` + +--- + +## Configure Magemaker + +Run the following command once your workspace is ready: + +```bash +magemaker --cloud azure +``` + +Magemaker will: +1. Validate your Azure CLI authentication. +2. Prompt for – or auto-detect – Subscription ID, Resource Group, Workspace Name, and Region. +3. Create a `.env` file in your project root with the required variables. + +### Required Environment Variables + +Magemaker (and the underlying Azure SDK) reads the following from `.env`: + +```bash +AZURE_SUBSCRIPTION_ID="" +AZURE_RESOURCE_GROUP="" +AZURE_WORKSPACE_NAME="" +AZURE_REGION="" # e.g. eastus + +# Optional – needed for gated Hugging Face models like Llama-3 +HUGGING_FACE_HUB_KEY="" +``` + +Never commit your `.env` file to version control! + +--- + +## Quota & Instance Selection + +Before deploying large models (e.g. Llama 3) ensure you have quota for the desired GPU SKU in the chosen region. Request increases via **Subscriptions ➜ Quotas ➜ Provider: Machine Learning** in the Azure Portal. + +Typical instance types: +- **Standard_DS3_v2** – CPU-only, suitable for small text models. +- **Standard_NC6s_v3** – 1× V100 GPU. +- **Standard_NC24ads_A100_v4** – 4× A100 GPUs, recommended for >7 B parameter models. + +--- + +## Next Steps + +With configuration complete you can: + +- Deploy a model: + ```bash + magemaker --deploy .magemaker_config/your-model.yaml + ``` +- List or delete endpoints, or query them, via the interactive menu (`magemaker --cloud azure`). +- Review the [Quick Start](/quick-start) for YAML examples. + +The model IDs for Azure differ from Hugging Face. Use the ID shown in the Azure **Model Catalog** card. diff --git a/configuration/Environment.mdx b/configuration/Environment.mdx index 0781ec3..0e7f302 100644 --- a/configuration/Environment.mdx +++ b/configuration/Environment.mdx @@ -2,29 +2,91 @@ title: Environment Variables --- -### Required Config File -A `.env` file is automatically created when you run `magemaker --cloud `. This file contains the necessary environment variables for your cloud provider(s). +### Overview +Magemaker relies on a small set of *well-known* environment variables that are read at runtime by the CLI **and** by the optional FastAPI proxy (`server.py`). +These variables are usually written automatically to a `.env` file the first time you run -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: +```bash +magemaker --cloud +``` + +You may edit the file manually if you need to rotate keys or change regions. +Make sure the file lives in the same working directory where you run Magemaker, or export the absolute path through the `ENV_PATH` variable (advanced use-case). + +Never commit your `.env` file to version control! + +--- +## Required Variables by Cloud Provider ```bash -# AWS Configuration -AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS -SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS - -# GCP Configuration -PROJECT_ID="your-project-id" # Required for GCP -GCLOUD_REGION="us-central1" # Required for GCP - -# Azure Configuration -AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure -AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure -AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure -AZURE_REGION="eastus" # Required for Azure - -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +# ────────────────── AWS ────────────────── +AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS operations +AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS operations +SAGEMAKER_ROLE="arn:aws:iam::123456789012:role/MagemakerSageMakerRole" # IAM role ARN +AWS_REGION_NAME="us-east-1" # *Auto-populated* but you can override + +# ────────────────── GCP ────────────────── +PROJECT_ID="your-project-id" # Required for GCP Vertex AI +GCLOUD_REGION="us-central1" # GCP region hosting Vertex AI endpoints + +# ────────────────── Azure ────────────────── +AZURE_SUBSCRIPTION_ID="your-subscription" # Azure Subscription GUID +AZURE_RESOURCE_GROUP="ml-resources" # Resource group that hosts ML workspace +AZURE_WORKSPACE_NAME="ml-workspace" # Name of the Azure ML workspace +AZURE_REGION="eastus" # Region where resources are created + +# ────────────────── Hugging Face ────────────────── +HUGGING_FACE_HUB_KEY="your-hf-token" # Needed for gated models (e.g. Llama 3) ``` -Never commit your .env file to version control! +--- +## Variables Used by the FastAPI Proxy (`server.py`) +The proxy exposes OpenAI-compatible routes (`/chat/completions`, etc.) so you can drop-in replace the OpenAI SDK. + +```bash +# ────────────────── Proxy Settings ────────────────── +API_HOST="0.0.0.0" # Bind address for FastAPI (default: 0.0.0.0) +API_PORT="8000" # Port the server listens on (default: 8000) + +# ────────────────── OpenAI SDK Compatibility ────────────────── +OPENAI_API_KEY="dummy" # The OpenAI client mandates a key value; any string works. +OPENAI_BASE_URL="http://localhost:8000" # Point your OpenAI SDK to the proxy +``` + + +Only `OPENAI_API_KEY` and `OPENAI_BASE_URL` are needed if you interact with Magemaker programmatically through the OpenAI SDK. The key itself is **not** checked by Magemaker—it just satisfies the client-side validation performed by the SDK. + + +--- +## Advanced / Optional Variables + +```bash +# Directory that contains YAML configs; defaults to .magemaker_config +CONFIG_DIR="/absolute/path/to/configs" + +# Explicit path to the .env file if it is **not** in the working directory +ENV_PATH="/absolute/path/to/.env" +``` + +--- +## Loading Order +Magemaker uses the following precedence when resolving variables: + +1. Explicitly exported shell variables (`export AWS_ACCESS_KEY_ID=…`) +2. Values in `.env` (loaded via `python-dotenv`) +3. Hard-coded fallbacks inside the codebase + +--- +## Quick Checklist + + + + Verify new keys have been written to `.env`. + + + Ensure `.env` is listed in `.gitignore`. + + + Delete or rename the existing `.env` and rerun the configuration wizard. + + diff --git a/configuration/GCP.mdx b/configuration/GCP.mdx index c9cd369..c64830b 100644 --- a/configuration/GCP.mdx +++ b/configuration/GCP.mdx @@ -1,38 +1,97 @@ --- title: GCP +description: Configure Magemaker for Google Cloud Platform --- + + + Magemaker supports deploying Hugging Face models to Google Cloud Vertex AI. Before proceeding make sure you are on macOS >= 13.6.6 (Rosetta terminal for Apple Silicon) or an equivalent Linux/Windows environment and have Python 3.11 installed. + + +## Overview +These steps will get your Google Cloud account ready, install the Cloud SDK, enable Vertex AI and configure Magemaker so it can deploy and manage endpoints on your behalf. + - -Visit [Google Cloud Console](https://cloud.google.com/?hl=en) to create your account. - - - Once you have created your account, create a new project. If this is your first time the default project is "My First Project". You can create a new project by clicking this button and then selecting "New Project". + +Visit the [Google Cloud Console](https://cloud.google.com/?hl=en) to create an account. + - ![Enter image alt description](../Images/google_new_project.png) + +Once signed-in, create a new project (or select an existing one). If this is your first time, the default project is **“My First Project”**. Click the project drop-down in the top-left corner then select **“New Project”**. +![Create Project](../Images/google_new_project.png) -1. Follow the installation guide at [Google Cloud SDK Installation Documentation](https://cloud.google.com/sdk/docs/install-sdk) -2. Initialize the SDK by running: - ```bash - gcloud init - ``` - +1. Follow the official installation guide: [Google Cloud SDK Installation Documentation](https://cloud.google.com/sdk/docs/install-sdk) +2. Initialise the SDK: + +```bash +gcloud init +``` -3. During initialization: - - Create login credentials when prompted - - Create a new project or select an existing one - To make sure the initialization worked, run: - ```bash - gcloud auth application-default login - ``` +During initialisation you will: +- Authenticate in a browser window +- Select the project you just created +- Choose a default region/zone (you can change this later) + +Finally, verify application-default credentials: + +```bash +gcloud auth application-default login +``` + -Navigate to the APIs & Services on the dashboard and enable the Vertex AI API for your project. +In the Google Cloud Console navigate to **APIs & Services → Library**, search for **Vertex AI API** and click **Enable**. + +![Enable Vertex AI](../Images/QrB_Image_11.png) + + + +Magemaker uses your Application-Default Credentials. If you prefer using a Service Account make sure it has (at minimum): +- `roles/aiplatform.admin` +- `roles/storage.admin` (for model artefacts) +- `roles/iam.serviceAccountUser` -![Enter image alt description](../Images/QrB_Image_11.png) +Attach the Service Account to your local ADC file or set `GOOGLE_APPLICATION_CREDENTIALS` to the JSON key path. - \ No newline at end of file + +Now that the SDK is configured, let Magemaker create the necessary `.env` file: + +```bash +magemaker --cloud gcp +``` + +You will be prompted for: +- **Project ID** (e.g. `my-vertex-project`) +- **Default Region** (e.g. `us-central1`) + +Magemaker writes these values to `.env` so the library modules (e.g. `magemaker/gcp/create_model.py`) can access them at runtime. + +```bash +# .env (created/updated by Magemaker) +PROJECT_ID="my-vertex-project" +GCLOUD_REGION="us-central1" +``` + + + +Deploying larger models (e.g. Llama 3) requires GPUs. If you see quota errors when deploying, open **IAM & Admin → Quotas**, filter by **Vertex AI / Accelerators** and request an increase for the GPU type (e.g. NVIDIA _T4_, _L4_ or _A100_) in your chosen region. + + + + +## Verification +After completing the steps above you can validate the configuration by listing existing (or empty) Vertex AI endpoints: + +```bash +python -c "from magemaker.gcp.resources import list_vertex_ai_endpoints; print(list_vertex_ai_endpoints())" +``` + +If no error is raised, you are ready to deploy! + + +Remember to delete endpoints you’re no longer using to avoid unnecessary charges. You can do this from the Magemaker dropdown or via `magemaker.gcp.delete_model`. + diff --git a/getting_started.md b/getting_started.md index 0bc86fa..b7ab1d5 100644 --- a/getting_started.md +++ b/getting_started.md @@ -1,37 +1,29 @@ # Getting Started with Magemaker -Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. +Magemaker is a Python tool that simplifies the process of deploying an open-source AI model to your own cloud. Deploy from an interactive menu in the terminal or from a simple YAML file. -Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning, from the command line or a simple YAML file. +Instead of spending hours digging through documentation, Magemaker lets you deploy models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure ML— all from the command line or a YAML file. -Choose a model from Hugging Face, and Magemaker will spin up an instance with a ready-to-query endpoint of the model in minutes. +Choose a model from Hugging Face, SageMaker JumpStart, or even an S3 path, and Magemaker will spin up an instance with a ready-to-query endpoint in minutes. ## Getting Started -Magemaker works with the three major cloud providers AWS, Azure and GCP! +Magemaker works with the three major cloud providers—AWS, Azure and GCP! To get a local copy up and running follow these simple steps. ### Prerequisites -* Python 3.11 (3.13 is not supported because of azure) +* Python 3.11+ (Python 3.12 is not supported; Python 3.13 is not supported for Azure) * Cloud Configuration - * An account to your preferred cloud provider, AWS, GCP and Azure. - * Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider - * Here's a guide on how to configure AWS and get the credentials [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) - * Quota approval for instances you require for the AI model - * By default, you get some free instances, example with AWS you are pre-approved for 2 ml.m5.xlarge instances with 16gb of RAM each + * An account on your preferred cloud provider (AWS, GCP, Azure) + * Appropriate instance quotas for the models you plan to run (e.g., AWS gives 2 × `ml.m5.xlarge` for free by default) + * Installation of the relevant cloud CLI tool(s) — Magemaker will prompt you to install or configure them if missing +* Certain Hugging Face models (e.g., Llama 2/3) require an access token ([HF docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) - * An installation and configuration of your selected cloud CLI tool(s) - * Magemaker will prompt you to install the CLI of the selected cloud provider, if not installed already. - * Magemaker will prompt you to add the necesssary credentials. - -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) - - -## Installation +### Installation 1. Install Magemaker using pip: @@ -39,49 +31,50 @@ To get a local copy up and running follow these simple steps. pip install magemaker ``` -2. Run Magemaker: +2. Run Magemaker for initial configuration: ```sh magemaker --cloud [aws|gcp|azure|all] ``` - If this is your first time running this command, It will configure the selected cloud so you’re ready to start deploying models. + If this is your first time running the command, Magemaker will configure the selected cloud(s) so you’re ready to start deploying models. - In the case of AWS, it’ll prompt you to enter your Access Key and Secret. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. + *AWS example*: you’ll be prompted for your Access Key and Secret Key and, optionally, your region (default `us-east-1`). + Once complete, Magemaker creates a `.env` file with your credentials. You can also add your Hugging Face Hub Token here: - Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. - - ```sh - HUGGING_FACE_HUB_KEY="KeyValueHere" + ```bash + HUGGING_FACE_HUB_KEY="your-hf-token" ``` -
## Using Magemaker -### Interactive deployment +### Interactive Deployment Run `magemaker --cloud [gcp|azure|aws|all]` to access an interactive menu where you can: * Choose your cloud provider * Select from available models -* Configure deployment settings -* Monitor deployment progress + * **Hugging Face** — paste the full model ID (e.g., `google-bert/bert-base-uncased`) + * **SageMaker JumpStart** — search and pick a JumpStart model + * **Custom** — provide a local path or S3 URI to a model +* Configure deployment settings (instance type, GPU count, etc.) +* Monitor deployment progress in real time -#### YAML-based Deployment -For reproducible deployments, use YAML configuration: +### YAML-based Deployment (recommended for CI/CD) -``` +Deploy reproducibly by passing a YAML file: + +```sh magemaker --deploy .magemaker_config/bert-base-uncased.yaml ``` -Following is a sample yaml file for deploying a model the same google bert model mentioned above to AWS: +Sample YAML for deploying a Hugging Face model to AWS: ```yaml deployment: !Deployment destination: aws - # Endpoint name matches model_id for querying atm. endpoint_name: test-bert-uncased instance_count: 1 instance_type: ml.m5.xlarge @@ -92,7 +85,23 @@ models: source: huggingface ``` -Following is a yaml file for deploying a facebook model to GCP Vertex AI: +Sample YAML for deploying a SageMaker JumpStart model: + +```yaml +deployment: !Deployment + destination: aws + endpoint_name: jumpstart-llm + instance_count: 1 + instance_type: ml.g5.2xlarge + +models: +- !Model + id: huggingface-llm/amazon-llama2-7b + source: sagemaker # JumpStart models use the "sagemaker" source +``` + +Sample YAML for GCP Vertex AI: + ```yaml deployment: !Deployment destination: gcp @@ -100,20 +109,15 @@ deployment: !Deployment accelerator_count: 1 instance_type: g2-standard-12 accelerator_type: NVIDIA_L4 - num_gpus: null - quantization: null models: - !Model id: facebook/opt-125m - location: null - predict: null source: huggingface - task: null - version: null - ``` -For Azure ML: + +Sample YAML for Azure ML: + ```yaml deployment: !Deployment destination: azure @@ -123,27 +127,22 @@ deployment: !Deployment models: - !Model id: facebook-opt-125m - location: null - predict: null source: huggingface task: text-generation - version: null ``` -#### Fine-tuning a model using a yaml file +### Fine-tuning a Model -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file +Fine-tune via YAML with the `--train` flag: -` +```sh magemaker --train .magemaker_config/train-bert.yaml -` - -Here is an example yaml file for fine-tuning a hugging-face model: +``` ```yaml training: !Training destination: aws # or gcp, azure - instance_type: ml.p3.2xlarge # varies by cloud provider + instance_type: ml.p3.2xlarge instance_count: 1 training_input_path: s3://your-bucket/data.csv hyperparameters: !Hyperparameters @@ -151,34 +150,28 @@ training: !Training per_device_train_batch_size: 32 learning_rate: 2e-5 +models: +- !Model + id: google-bert/bert-base-uncased + source: huggingface ``` - -

If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great: -
-
-**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** +**Model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) -- **Type:** Fill Mask: tries to complete your sentence like Madlibs -- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -- -
-
+- **Type:** Fill Mask +- **Query format:** sentence containing `[MASK]` -**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** +**Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering -- **Query format:** "*type out a sentence like this one.*" +- **Type:** Feature extraction +- **Query format:** plain sentence
-
- - ## Deactivating Models -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. +Any endpoints you spin up will continue running until deleted. Be sure to remove unused endpoints to avoid unexpected charges. diff --git a/installation.mdx b/installation.mdx index 1d843eb..3481bb8 100644 --- a/installation.mdx +++ b/installation.mdx @@ -3,19 +3,16 @@ title: Installation description: Configure Magemaker for your cloud provider --- - - For Macs, maxOS >= 13.6.6 is required. Apply Silicon devices (M1) must use Rosetta terminal. You can verify, your terminals architecture by running `arch`. It should print `i386` for Rosetta terminal. + For Macs, maxOS ≥ 13.6.6 is required. Apple Silicon devices (M-series) must use a Rosetta terminal. You can verify your terminal architecture by running arch; it should print i386 for Rosetta. - Install via pip: ```sh pip install magemaker ``` - ## Cloud Account Setup ### AWS Configuration @@ -23,22 +20,20 @@ pip install magemaker - Follow this detailed guide for setting up AWS credentials: [AWS Setup Guide](/configuration/AWS) -Once you have your AWS credentials, you can configure Magemaker by running: +Once you have your AWS credentials, configure Magemaker by running: ```bash magemaker --cloud aws ``` -It will prompt you for aws credentials and set up the necessary configurations. - +This generates a `.env` file (or updates an existing one) with the required keys and verifies your SageMaker-execution role. ### GCP (Vertex AI) Configuration - Follow this detailed guide for setting up GCP credentials: [GCP Setup Guide](/configuration/GCP) - -once you have your GCP credentials, you can configure Magemaker by running: +After obtaining your GCP credentials, run: ```bash magemaker --cloud gcp @@ -47,112 +42,99 @@ magemaker --cloud gcp ### Azure Configuration - Follow this detailed guide for setting up Azure credentials: - [GCP Setup Guide](/configuration/Azure) + [Azure Setup Guide](/configuration/Azure) - -Once you have your Azure credentials, you can configure Magemaker by running: +Then run: ```bash magemaker --cloud azure ``` +### All Three Cloud Providers -### All three cloud providers - -If you have configured all three cloud providers, you can verify your configuration by running: +If you plan to use every provider, you can initialise them all in one go: ```bash magemaker --cloud all ``` +--- + +## Environment Variables -### Required Config File -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: +Magemaker stores cloud-specific credentials, regions, and optional settings in a `.env` file at your project root. +For a complete, always-up-to-date list, see the dedicated +[Environment Variables](/configuration/Environment) page. +Below is a condensed cheat-sheet of the most common keys: ```bash -# AWS Configuration -AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS -SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS - -# GCP Configuration -PROJECT_ID="your-project-id" # Required for GCP -GCLOUD_REGION="us-central1" # Required for GCP - -# Azure Configuration -AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure -AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure -AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure -AZURE_REGION="eastus" # Required for Azure - -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +# AWS +AWS_ACCESS_KEY_ID="..." +AWS_SECRET_ACCESS_KEY="..." +SAGEMAKER_ROLE="arn:aws:iam::..." +AWS_REGION_NAME="us-east-1" # Optional, overrides default region + +# GCP +PROJECT_ID="your-project-id" +GCLOUD_REGION="us-central1" + +# Azure +AZURE_SUBSCRIPTION_ID="..." +AZURE_RESOURCE_GROUP="ml-resources" +AZURE_WORKSPACE_NAME="ml-workspace" +AZURE_REGION="eastus" + +# Hugging Face (required for gated models like Llama-3) +HUGGING_FACE_HUB_KEY="hf_..." + +# Optional — OpenAI-compatible proxy (server.py) +API_HOST="0.0.0.0" +API_PORT="8000" +OPENAI_API_KEY="test-key" # Any non-empty string for local calls +OPENAI_BASE_URL="http://localhost:8000/v1" ``` -Never commit your .env file to version control! +Never commit your .env file to version control! - For gated models like llama-3.1 from Meta, you might have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. + For gated models like Meta Llama-3 you must (1) accept the model’s terms on Hugging Face and (2) set HUGGING_FACE_HUB_KEY in your environment. -{/* ## Verification - -To verify your configuration: - -```bash -magemaker verify -``` */} +--- ## Best Practices 1. **Resource Management** - Monitor quota limits - - Clean up unused resources + - Clean up unused endpoints - Set up cost alerts 2. **Environment Management** - - - Use separate configurations for dev/prod - - Regularly rotate access keys - - Use environment-specific roles + - Separate dev / prod credentials + - Rotate access keys regularly + - Use cloud-specific service accounts where possible 3. **Security** + - Follow the principle of least privilege + - Restrict IAM roles to required services only + - Enable audit logging on your cloud accounts - - Follow principle of least privilege - - Use service accounts where possible - - Enable audit logging +--- + +## Troubleshooting +Common configuration issues and how to resolve them. +1. **AWS** + - Verify IAM role permissions + - Check SageMaker quota limits and region settings + - Ensure your execution role has the correct trust relationship -## Troubleshooting +2. **GCP** + - Confirm Vertex AI API is enabled + - Validate service-account permissions and default application credentials -Common configuration issues: - -1. **AWS Issues** - - - Check IAM role permissions - - Verify SageMaker quota - - Confirm region settings - -2. **GCP Issues** - - - Verify service account permissions - - Check Vertex AI API enablement - - Confirm project ID - -3. **Azure Issues** - - Check resource provider registration status: - ```bash - az provider show -n Microsoft.MachineLearningServices - az provider show -n Microsoft.ContainerRegistry - az provider show -n Microsoft.KeyVault - az provider show -n Microsoft.Storage - az provider show -n Microsoft.Insights - az provider show -n Microsoft.ContainerService - az provider show -n Microsoft.PolicyInsights - az provider show -n Microsoft.Cdn - ``` - - Verify workspace access - - Confirm subscription status - - Ensure all required providers are registered +3. **Azure** + - Verify required resource providers are registered (see Azure guide) + - Check workspace access and quota availability diff --git a/mint.json b/mint.json index ccb1843..6d9089e 100644 --- a/mint.json +++ b/mint.json @@ -38,9 +38,13 @@ "mode": "auto" }, "navigation": [ - { + { "group": "Getting Started", - "pages": ["about", "installation", "quick-start"] + "pages": [ + "about", + "installation", + "quick-start" + ] }, { "group": "Tutorials", @@ -54,6 +58,7 @@ "group": "Configurations", "pages": [ "configuration/AWS", + "configuration/AWS-IAM-Role", "configuration/Azure", "configuration/GCP", "configuration/Environment" @@ -66,6 +71,13 @@ "concepts/models", "concepts/contributing" ] + }, + { + "group": "API Reference", + "pages": [ + "concepts/api-reference", + "concepts/openai-proxy" + ] } ], "footerSocials": { @@ -77,17 +89,33 @@ { "title": "Documentation", "links": [ - { "label": "Getting Started", "url": "/" }, - { "label": "Contributing", "url": "/contributing" } + { + "label": "Getting Started", + "url": "/" + }, + { + "label": "Contributing", + "url": "/contributing" + }, + { + "label": "API Reference", + "url": "/concepts/api-reference" + } ] }, { "title": "Resources", "links": [ - { "label": "GitHub", "url": "https://github.com/slashml/magemaker" }, - { "label": "Support", "url": "mailto:support@slashml.com" } + { + "label": "GitHub", + "url": "https://github.com/slashml/magemaker" + }, + { + "label": "Support", + "url": "mailto:support@slashml.com" + } ] } ] } -} \ No newline at end of file +} diff --git a/tutorials/deploying-llama-3-to-aws.mdx b/tutorials/deploying-llama-3-to-aws.mdx index 46f0659..8b227e4 100644 --- a/tutorials/deploying-llama-3-to-aws.mdx +++ b/tutorials/deploying-llama-3-to-aws.mdx @@ -3,108 +3,117 @@ title: Deploying Llama 3 to SageMaker --- ## Introduction -This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial guides you through deploying **Llama 3 (8B Instruct)** to AWS SageMaker using Magemaker and querying it via the interactive dropdown menu. Make sure you have completed the [installation](installation) steps and have accepted the model’s terms of use on Hugging Face. -## Step 1: Setting Up Magemaker for AWS +## Step 1: Configure Magemaker for AWS -Run the following command to configure Magemaker for AWS SageMaker deployment: +Run the following command to initialise Magemaker for AWS SageMaker deployment: ```sh magemaker --cloud aws ``` -This initializes Magemaker with the necessary configurations for deploying models to SageMaker. +This will: +1. Prompt you for your AWS access key, secret and preferred region (defaults to `us-east-1`). +2. Generate a `.env` file with the required variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION_NAME`, `SAGEMAKER_ROLE`, etc.). +3. Optionally let you add your `HUGGING_FACE_HUB_KEY` which is **required** for gated models like Llama 3. -## Step 2: YAML-based Deployment + +You must ensure your AWS account has sufficient SageMaker quota for the instance type and GPU count you intend to use. See the AWS configuration guide for details on requesting quota increases. + + +## Step 2: YAML-based Deployment (Recommended) -For reproducible deployments, use YAML configuration: +For reproducible deployments and CI/CD pipelines, use a YAML configuration file: ```sh -magemaker --deploy .magemaker_config/your-model.yaml +magemaker --deploy .magemaker_config/llama3-8b.yaml ``` -Example YAML for AWS deployment: +A minimal example for Llama 3 8B on SageMaker looks like this: ```yaml +# .magemaker_config/llama3-8b.yaml + deployment: !Deployment destination: aws - endpoint_name: llama3-endpoint + endpoint_name: llama3-8b-endpoint # any unique name instance_count: 1 - instance_type: ml.g5.2xlarge - num_gpus: 1 - quantization: null + instance_type: ml.g5.12xlarge # required for 8B model + num_gpus: 4 # g5.12xlarge has 4× A10G GPUs + quantization: null # optional, e.g. "bitsandbytes" to save memory models: - !Model id: meta-llama/Meta-Llama-3-8B-Instruct - location: null - predict: null source: huggingface task: text-generation - version: null + # You can optionally add a predict block to set default generation params + # predict: + # temperature: 0.7 + # max_new_tokens: 200 ``` - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. +If you supply `quantization: bitsandbytes`, Magemaker will deploy the model with 8-bit quantisation, reducing GPU memory requirements and cost at the expense of some accuracy. +## Step 3: Deploy the Model - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding. - +Execute the deployment command: + +```sh +magemaker --deploy .magemaker_config/llama3-8b.yaml +``` -## Step 3: Querying the Deployed Model +Magemaker will: +1. Package the model and environment. +2. Create an ECR image (if necessary). +3. Provision the SageMaker endpoint. -Once the deployment is complete, note down the endpoint id. +Deployment can take **10–20 minutes** for large models—watch the terminal progress bar or check the SageMaker console. -You can use the interactive dropdown menu to quickly query the model. +## Step 4: Query the Deployed Model -### Querying Models +After the endpoint status changes to **InService**, you can query it via the interactive CLI or programmatically. -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +### Interactive Query (CLI) -![Query Endpoints](../Images/query-1.png) +1. Run `magemaker --cloud aws` again. +2. Choose **“Query a Model Endpoint”**. +3. Select your `llama3-8b-endpoint`. +4. Enter your prompt and press Enter. -Or you can use the following code: -```python +### Programmatic Query (Python) + +```python from sagemaker.huggingface.model import HuggingFacePredictor import sagemaker -def query_huggingface_model(endpoint_name: str, query: str): - # Initialize a SageMaker session + +def query_llama3(endpoint_name: str, prompt: str): sagemaker_session = sagemaker.Session() - - # Create a HuggingFace predictor predictor = HuggingFacePredictor( endpoint_name=endpoint_name, sagemaker_session=sagemaker_session ) - - # Prepare the input - input_data = { - "inputs": query - } - - try: - # Make prediction - result = predictor.predict(input_data) - print(result) - return result - except Exception as e: - print(f"Error making prediction: {str(e)}") - raise e - -# Example usage + + response = predictor.predict({ + "inputs": prompt, + "parameters": { + "max_new_tokens": 200, + "temperature": 0.7, + "top_p": 0.9 + } + }) + print(response) + return response + + if __name__ == "__main__": - # Replace with your actual endpoint name - ENDPOINT_NAME = "your-deployed-endpoint" - - # Your test question - question = "what are you?" - - # Make prediction - response = query_huggingface_model(ENDPOINT_NAME, question) + ENDPOINT = "llama3-8b-endpoint" # change to your endpoint + query_llama3(ENDPOINT, "Explain the theory of relativity in simple terms.") ``` -## Conclusion -You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). +## Conclusion +You have successfully deployed and queried Llama 3 8B on AWS SageMaker using Magemaker. For issues or feedback, reach out at [support@slashml.com](mailto:support@slashml.com). diff --git a/tutorials/deploying-llama-3-to-azure.mdx b/tutorials/deploying-llama-3-to-azure.mdx index 679ba23..5a21c45 100644 --- a/tutorials/deploying-llama-3-to-azure.mdx +++ b/tutorials/deploying-llama-3-to-azure.mdx @@ -3,141 +3,143 @@ title: Deploying Llama 3 to Azure --- ## Introduction -This tutorial guides you through deploying Llama 3 to Azure ML platform using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial walks you through deploying **Llama 3 – 8 B Instruct** to Azure Machine Learning with Magemaker and then querying the endpoint. - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your Azure quotas before proceeding. - +Before you begin make sure you have: + +1. Followed the [installation](installation) guide and successfully run `magemaker --cloud azure` (this writes the required `.env` file with `AZURE_SUBSCRIPTION_ID`, `AZURE_RESOURCE_GROUP`, and `AZURE_WORKSPACE_NAME`). +2. Requested any necessary **GPU quota** in the region where you will deploy. +3. Accepted the model terms on Hugging Face *and* added your `HUGGING_FACE_HUB_KEY` to the same `.env` file if the model is gated. + + +Azure requires **globally-unique endpoint names** within a region. Magemaker automatically creates a unique name for you (for example `hf-ep-1714062272`). Any `endpoint_name` you provide in YAML is therefore ignored. + - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. +Azure model IDs are **not** the same as their Hugging Face IDs. Azure flattens the path by replacing "/" with "-". For example: - To find the relevnt model id, follow the steps in the [quick start](For Azure ML) +``` +Hugging Face : meta-llama/Meta-Llama-3-8B-Instruct +Azure Catalog : meta-llama-meta-llama-3-8b-instruct +``` + +Open the **Model Catalog** in Azure ML Studio, switch the collection filter to *Hugging Face*, and copy the **Model ID** shown on the card. -## Step 1: Setting Up Magemaker for Azure +--- -Run the following command to configure Magemaker for Azure deployment: +## 1 – Configure Magemaker for Azure -```sh +```bash magemaker --cloud azure ``` -This initializes Magemaker with the necessary configurations for deploying models to Azure ML Studio. +The first time you run this command Magemaker will: -## Step 2: YAML-based Deployment +1. Verify the Azure CLI is installed. +2. Prompt you to sign in (`az login`) if necessary. +3. Ask for – or attempt to detect – your **subscription**, **resource group**, and **workspace**. +4. Write everything to `.env`. -For reproducible deployments, use YAML configuration: +--- -```sh -magemaker --deploy .magemaker_config/your-model.yaml -``` +## 2 – Create a YAML deployment file -Example YAML for Azure deployment: +Save the following as `.magemaker_config/llama3-8b-azure.yaml` (or any filename you like): ```yaml +# .magemaker_config/llama3-8b-azure.yaml + deployment: !Deployment - destination: azure - endpoint_name: llama3-endpoint + destination: azure # Target cloud + instance_type: Standard_NC24ads_A100_v4 # 1× A100 80 GB GPU instance_count: 1 - instance_type: Standard_NC24ads_A100_v4 + # endpoint_name is ignored for Azure – Magemaker generates a unique name models: - !Model - id: meta-llama-meta-llama-3-8b-instruct - location: null - predict: null + id: meta-llama-meta-llama-3-8b-instruct # Azure-specific model ID! source: huggingface task: text-generation - version: null ``` - - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. - - -### Selecting an Appropriate Instance -For 8B parameter models, recommended instance types include: - -- Standard_NC24ads_A100_v4 (optimal performance) -- Standard_NC24s_v3 (cost-effective option with V100) - -If you encounter quota issues, submit a quota increase request in the Azure console. In the search bar search for `Quotas` and select the subscription you are using. In the `provider` select `Machine Learning` and then select the relevant region for the quota increase - -![Azure Quota](../Images/quotas.png) +If your subscription does not yet have the required quota for `Standard_NC24ads_A100_v4`, submit a **Quota Increase** request. In the Azure portal, search for **“Quotas” → Provider: Machine Learning** and increase the SKU in your deployment region. -## Step 3: Querying the Deployed Model +--- + +## 3 – Deploy the model -Once the deployment is complete, note down the endpoint id. +```bash +magemaker --deploy .magemaker_config/llama3-8b-azure.yaml +``` -You can use the interactive dropdown menu to quickly query the model. +Magemaker will stream logs as it: -### Querying Models +1. Creates a **Managed Online Endpoint** (unique name like `hf-ep-1714062272`). +2. Builds an **Environment** with the required transformers and PyTorch versions. +3. Creates a **Deployment** and routes 100 % traffic to it. +4. Prints the **scoring URI** when everything is live. -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +Deployment time for 8 B models is typically **8‒12 minutes**. -![Query Endpoints](../Images/query-1.png) +--- -Or you can use the following code +## 4 – Query the endpoint -```python +You can use the interactive dropdown (`magemaker --cloud azure`) → **“Query a Model Endpoint”**, or call the endpoint programmatically as shown below. +```python from azure.identity import DefaultAzureCredential from azure.ai.ml import MLClient -from azure.mgmt.resource import ResourceManagementClient - from dotenv import dotenv_values -import os +import json, os, time -def query_azure_endpoint(endpoint_name, query): - # Initialize the ML client - subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") - resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") - workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") +def query_azure_endpoint(endpoint_name: str, prompt: str): + """Send a prompt to an Azure ML managed online endpoint created by Magemaker.""" - credential = DefaultAzureCredential() + env = dotenv_values(".env") ml_client = MLClient( - credential=credential, - subscription_id=subscription_id, - resource_group_name=resource_group, - workspace_name=workspace_name + credential=DefaultAzureCredential(), + subscription_id=env["AZURE_SUBSCRIPTION_ID"], + resource_group_name=env["AZURE_RESOURCE_GROUP"], + workspace_name=env["AZURE_WORKSPACE_NAME"], ) - import json - - # Test data - test_data = { - "inputs": query - } - - # Save the test data to a temporary file - with open("test_request.json", "w") as f: - json.dump(test_data, f) + payload = {"inputs": prompt} + tmp_file = f"request-{int(time.time())}.json" + with open(tmp_file, "w") as f: + json.dump(payload, f) - # Get prediction response = ml_client.online_endpoints.invoke( endpoint_name=endpoint_name, - request_file = 'test_request.json' + request_file=tmp_file, ) - print('Raw Response Content:', response) - # delete a file - os.remove("test_request.json") + os.remove(tmp_file) return response - -endpoint_id = 'your-endpoint-id-here' - -input_text = 'What are you?' -resp = query_azure_endpoint(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +if __name__ == "__main__": + endpoint_name = "hf-ep-1714062272" # Replace with the name Magemaker printed + reply = query_azure_endpoint(endpoint_name, "What are you?") + print(reply) +``` + +--- + +## 5 – Clean-up +Endpoints incur charges as long as they are running. Delete them when you are done: + +```bash +magemaker --cloud azure # choose “Delete a Model Endpoint” from the menu ``` -## Conclusion -You have successfully deployed and queried Llama 3 on Azure using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). +--- +## Conclusion +You have successfully deployed and queried **Llama 3 – 8 B** on Azure ML with Magemaker. 🎉 +For feedback or questions reach us at [support@slashml.com](mailto:support@slashml.com). diff --git a/tutorials/deploying-llama-3-to-gcp.mdx b/tutorials/deploying-llama-3-to-gcp.mdx index a94d616..7248908 100644 --- a/tutorials/deploying-llama-3-to-gcp.mdx +++ b/tutorials/deploying-llama-3-to-gcp.mdx @@ -3,28 +3,36 @@ title: Deploying Llama 3 to GCP --- ## Introduction -This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and then querying the model. Make sure you have completed the [installation](installation) steps and configured GCP credentials with: - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. +```bash +magemaker --cloud gcp +``` + + +You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. -## Step 1: Setting Up Magemaker for GCP +## Step 1: Setting Up Magemaker for GCP -Run the following command to configure Magemaker for GCP Vertex AI deployment: +Run the following command to configure Magemaker for Vertex AI deployment and create the required `.env` file (it will include `PROJECT_ID` and `GCLOUD_REGION`): -```sh +```bash magemaker --cloud gcp ``` -This initializes Magemaker with the necessary configurations for deploying models to Vertex AI. +If you plan to deploy gated Hugging Face models such as Llama 3, add your HF token to the same `.env` file: -## Step 2: YAML-based Deployment +```bash +HUGGING_FACE_HUB_KEY="" +``` + +## Step 2: YAML-based Deployment -For reproducible deployments, use YAML configuration: +For reproducible deployments, use a YAML configuration file: -```sh -magemaker --deploy .magemaker_config/your-model.yaml +```bash +magemaker --deploy .magemaker_config/llama3-gcp.yaml ``` Example YAML for GCP deployment: @@ -33,114 +41,102 @@ Example YAML for GCP deployment: deployment: !Deployment destination: gcp endpoint_name: llama3-endpoint - accelerator_count: 1 - instance_type: n1-standard-8 - accelerator_type: NVIDIA_T4 - num_gpus: 1 - quantization: null + instance_type: g2-standard-12 # GPU-enabled machine type + accelerator_type: NVIDIA_L4 # GPU type (L4) + accelerator_count: 1 # Number of GPUs + num_gpus: null # Optional override for multi-GPU setups + quantization: null # e.g. bitsandbytes | awq | null models: - !Model id: meta-llama/Meta-Llama-3-8B-Instruct - location: null - predict: null source: huggingface - task: text-generation - version: null + task: text-generation # Optional but recommended ``` + - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. +For gated models like Llama 3, you must accept the model licence on Hugging Face and provide `HUGGING_FACE_HUB_KEY` in your `.env` file. - ### Selecting an Appropriate Instance -For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements. +For Llama 3 (8 B), a `g2-standard-12` machine with an L4 GPU works well for most use-cases. Adjust the machine type, GPU, and `accelerator_count` based on your latency and cost requirements. -If you encounter quota issues, submit a quota increase request in the GCP console under "IAM & Admin > Quotas" for the specific GPU type in your deployment region. +If you encounter quota issues, navigate to **IAM & Admin → Quotas** in the Google Cloud console and request an increase for the relevant GPU type in your chosen region. -## Step 3: Querying the Deployed Model +## Step 3: Querying the Deployed Model -Once the deployment is complete, note down the endpoint id. +After the deployment finishes, note the **endpoint ID** that Magemaker prints. You can query the model either via the interactive dropdown menu or programmatically. -You can use the interactive dropdown menu to quickly query the model. +### 3.a – Interactive Query +From the dropdown, select **Query a Model Endpoint**, choose your endpoint, type a prompt, and press **Enter** to receive the response. -### Querying Models +![Query Endpoints](../Images/query-1.png) -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +### 3.b – Programmatic Query (REST) -![Query Endpoints](../Images/query-1.png) +Below is a minimal Python example that calls the Vertex AI endpoint using REST. It automatically retrieves credentials from your local gcloud setup or a service-account JSON file. -Or you can use the following code: -```python -from google.cloud import aiplatform -from google.protobuf import json_format -from google.protobuf.struct_pb2 import Value -import json +```python from dotenv import dotenv_values +import google.auth +import google.auth.transport.requests +import requests -def query_vertexai_endpoint_rest( - endpoint_id: str, - input_text: str, - token_path: str = None -): - import google.auth - import google.auth.transport.requests - import requests +def query_vertexai_endpoint_rest(endpoint_id: str, prompt: str, token_path: str | None = None): + """Query a Vertex AI endpoint created by Magemaker.""" - # TODO: this will have to come from config files - project_id = dotenv_values('.env').get('PROJECT_ID') - location = dotenv_values('.env').get('GCLOUD_REGION') + # Environment variables populated by `magemaker --cloud gcp` + env = dotenv_values(".env") + project_id = env.get("PROJECT_ID") + location = env.get("GCLOUD_REGION", "us-central1") - - # Get credentials + # Load credentials if token_path: - credentials, project = google.auth.load_credentials_from_file(token_path) + credentials, _ = google.auth.load_credentials_from_file(token_path) else: - credentials, project = google.auth.default() - - # Refresh token - auth_req = google.auth.transport.requests.Request() - credentials.refresh(auth_req) - - # Prepare headers and URL + credentials, _ = google.auth.default() + + # Refresh access token + credentials.refresh(google.auth.transport.requests.Request()) + headers = { "Authorization": f"Bearer {credentials.token}", - "Content-Type": "application/json" + "Content-Type": "application/json", } - - url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}:predict" - - # Prepare payload + + url = ( + f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/" + f"locations/{location}/endpoints/{endpoint_id}:predict" + ) + payload = { "instances": [ { - "inputs": input_text, - # TODO: this also needs to come from configs + "inputs": prompt, "parameters": { "max_new_tokens": 100, "temperature": 0.7, - "top_p": 0.95 - } + "top_p": 0.95, + }, } ] } - - # Make request - response = requests.post(url, headers=headers, json=payload) - print('Raw Response Content:', response.content.decode()) + response = requests.post(url, headers=headers, json=payload, timeout=60) + response.raise_for_status() return response.json() -endpoint_id="your-endpoint-id-here" -input_text='What are you?"' -resp = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +if __name__ == "__main__": + ENDPOINT_ID = "your-endpoint-id-here" # Replace with the real one + prompt = "What are you?" + + result = query_vertexai_endpoint_rest(ENDPOINT_ID, prompt) + print(result) ``` ## Conclusion -You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - +You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker. If you encounter any issues—or have feature requests—reach out to us at [support@slashml.com](mailto:support@slashml.com). diff --git a/updated_readme.md b/updated_readme.md index bcfc60b..23baf50 100644 --- a/updated_readme.md +++ b/updated_readme.md @@ -1,25 +1,24 @@ -
-

Magemaker v0.1, by SlashML

+

Magemaker by SlashML

- Deploy open source AI models to AWS in minutes. + Deploy open-source AI models to AWS SageMaker, GCP Vertex AI and Azure ML in minutes.
+ 📚 Full Documentation »

-
Table of Contents
  1. - About Magemaker + About Magemaker
  2. Getting Started @@ -28,229 +27,179 @@
  3. Installation
  4. -
  5. Using Magemaker
  6. -
  7. What we're working on next
  8. -
  9. Known issues
  10. +
  11. Usage
  12. +
  13. Example YAML Files
  14. +
  15. Fine-tuning
  16. +
  17. Deactivating Models
  18. +
  19. Roadmap
  20. +
  21. Known Issues
  22. Contributing
  23. License
  24. Contact
- -## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line. - -Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes. - - -
- -## Getting Started - -Magemaker works with AWS. Azure and GCP support are coming soon! - -To get a local copy up and running follow these simple steps. - -### Prerequisites +## About +Magemaker is a Python CLI + library that abstracts away the boiler-plate of standing up inference endpoints for open-source models. Pick a Hugging Face model (or SageMaker JumpStart / custom S3 artefact) and Magemaker will deploy it to: -* Python -* An AWS account -* Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free) -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) +* **AWS SageMaker** +* **Google Cloud Vertex AI** +* **Azure Machine Learning** -### Configuration +It also ships with a FastAPI server that exposes the deployed endpoints through an **OpenAI-compatible proxy API** allowing you to drop-in replace `openai` calls in existing apps. -**Step 1: Set up AWS and SageMaker** - -To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker. - -We wrote up the steps in [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) as well. +--- +## Getting Started +### Prerequisites {#prerequisites} -### Installing the package +* **Python 3.11+** (3.12 not yet supported; 3.13 blocked by Azure SDK) +* Account & basic quotas on at least one cloud provider + * AWS ➜ SageMaker + * GCP ➜ Vertex AI + * Azure ➜ Azure ML Studio +* Corresponding CLI installed (`aws`, `gcloud`, `az`) – Magemaker will guide you if missing +* (Optional) **Hugging Face access token** for gated models such as Llama-3 -**Step 1** +### Installation {#installation} -```sh +```bash pip install magemaker ``` -**Step 2: Running magemaker** +Initialise Magemaker for your cloud(s): -Run it by simply doing the following: +```bash +# one provider +magemaker --cloud aws -```sh -magemaker +# or configure several at once +magemaker --cloud all ``` -If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. +The wizard collects credentials, writes them to a local `.env`, and creates any required execution roles (see docs for details). -Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. - -```sh -HUGGING_FACE_HUB_KEY="KeyValueHere" -``` - -

(back to top)

+--- +## Usage {#usage} +### Interactive mode - -
- -## Using Magemaker - -### Deploying models from dropdown +```bash +magemaker --cloud [aws|gcp|azure|all] +``` -When you run `magemaker` comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy. +Use the dropdown UI to: +1. Deploy a model (HF, JumpStart or custom S3) +2. List active endpoints +3. Query an endpoint +4. Delete endpoints you no longer need -#### Deploying Hugging Face models -If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, `google-bert/bert-base-uncased`. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model. +### YAML-based mode (CI / reproducibility) -#### Deploying Sagemaker models -If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the `CONFIG_DIR=.magemaker_config` folder. You can modify the path to this folder by setting the `CONFIG_DIR` environment variable. +```bash +magemaker --deploy path/to/your-deployment.yaml +``` -#### Deploy using a yaml file -We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the `--deploy` option with local path like so: +--- -``` -magemaker --deploy .magemaker_config/bert-base-uncased.yaml -``` +## Examples {#examples} -Following is a sample yaml file for deploying a model the same google bert model mentioned above: +Deploy **BERT** to each cloud: +AWS SageMaker ```yaml deployment: !Deployment destination: aws - # Endpoint name matches model_id for querying atm. - endpoint_name: test-bert-uncased - instance_count: 1 + endpoint_name: bert-aws instance_type: ml.m5.xlarge - + instance_count: 1 models: - !Model id: google-bert/bert-base-uncased source: huggingface ``` -Following is a yaml file for deploying a llama model from HF: +GCP Vertex AI ```yaml deployment: !Deployment - destination: aws - endpoint_name: test-llama2-7b - instance_count: 1 - instance_type: ml.g5.12xlarge - num_gpus: 4 - # quantization: bitsandbytes - + destination: gcp + endpoint_name: bert-gcp + instance_type: g2-standard-12 + accelerator_type: NVIDIA_L4 + accelerator_count: 1 models: - !Model - id: meta-llama/Meta-Llama-3-8B-Instruct + id: google-bert/bert-base-uncased source: huggingface - predict: - temperature: 0.9 - top_p: 0.9 - top_k: 20 - max_new_tokens: 250 ``` -#### Fine-tuning a model using a yaml file - -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file - -` -magemaker --train .magemaker_config/train-bert.yaml -` - -Here is an example yaml file for fine-tuning a hugging-face model: - +Azure ML ```yaml -training: !Training - destination: aws - instance_type: ml.p3.2xlarge +deployment: !Deployment + destination: azure + endpoint_name: bert-azure + instance_type: Standard_DS3_v2 instance_count: 1 - training_input_path: s3://jumpstart-cache-prod-us-east-1/training-datasets/tc/data.csv - hyperparameters: !Hyperparameters - epochs: 1 - per_device_train_batch_size: 32 - learning_rate: 0.01 - models: - !Model - id: meta-textgeneration-llama-3-8b-instruct + id: google-bert-bert-base-uncased # Azure uses different IDs – see docs source: huggingface ``` +More examples (JumpStart, custom S3, quantisation, multi-model endpoints) are available in the [docs](https://magemaker.slashml.com). -
-
- -If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great: -
-
+--- -**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** - -- **Type:** Fill Mask: tries to complete your sentence like Madlibs -- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -- -
-
- -**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** - -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering -- **Query format:** "*type out a sentence like this one.*" - -
-
- - -### Deactivating models - -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. - - -

(back to top)

+## Fine-tuning {#fine-tuning} +```bash +magemaker --train path/to/train-config.yaml +``` - -
+See the Fine-tuning guide for supported hyper-parameters and cloud limitations. -## What we're working on next -- [ ] More robust error handling for various edge cases -- [ ] Verbose logging -- [ ] Enabling / disabling autoscaling -- [ ] Deployment to Azure and GCP +--- -

(back to top)

+## Deactivating Models {#deactivating-models} +Endpoints keep running – and billing – until deleted. Always remove endpoints you no longer need: +```bash +magemaker --cloud aws # open the UI +# → Delete a Model Endpoint +``` - -
+Or call the deletion helpers programmatically. -## Known issues -- [ ] Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc. -- [ ] Deleting a model is not instant, it may show up briefly after it was queued for deletion -- [ ] Deploying the same model within the same minute will break +--- -

(back to top)

+## Roadmap {#roadmap} +- [ ] Auto-scaling controls +- [ ] Rich logging & observability +- [ ] Additional quantisation back-ends +- [ ] Automated cost optimisation recommendations +--- - -
+## Known Issues {#known-issues} +1. Query helper currently text-only (multimodal WIP) +2. Deleting endpoints can take a few minutes to propagate +3. Deploying identical endpoint names within the same minute causes conflicts -## License +--- -Distributed under the Apache 2.0 License. See `LICENSE` for more information. +## Contributing {#contributing} +Want to help? Read the [Contributing Guide](https://magemaker.slashml.com/contributing) and check open issues. - -
+--- -## Contact +## License {#license} +Apache 2.0 – see `LICENSE`. -You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). +--- -We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. +## Contact {#contact} +Questions or feedback? +📧 support@slashml.com +💬 Join us on [Discord](https://discord.gg/SBQsD63d)