Skip to content

Latest commit

 

History

History
43 lines (22 loc) · 1.7 KB

README.md

File metadata and controls

43 lines (22 loc) · 1.7 KB

E2E - Inference Endpoints Hugging Face

In today's event, we'll create an E2E Application through Hugging Face Inference Endpoints!

There are 2 main sections to this event:

Deploy LLM and Embedding Model to SageMaker Endpoint Through Hugging Face Inference Endpoints

Select "Inference Endpoint" from the "Solutions" button in Hugging Face:

image

Create a "+ New Endpoint" from the Inference Endpoints dashboard.

image

Select the ai-maker-space/gen-z-translate-llama-3-instruct-v1 model repository and name your endpoint. Select N. Virginia as your region (us-east-1). Give your endpoint an appropriate name.

Select the following settings for your Advanced Configuration.

image

Create a Protected endpoint.

image

If you were successful, you should see the following screen:

image

You'll repeat the same process for your embedding model!

NOTE: PLEASE SHUTDOWN YOUR INSTANCES WHEN YOU HAVE COMPLETED THE ASSIGNMENT TO PREVENT UNESSECARY CHARGES.

Create a Simple Chat Application leveraging the new endpoint!

First, we fine-tune Llama 3 8B Instruct for a specific task, in this case: A translation task!

Then, we create a Docker Hugging Face space powering a Chainlit UI - code available here

Terminating Your Resources

Please go to each endpoint's settings and select Delete Endpoint. To delete the resources, you will need to type the endpoint's name.