Skip to content

tsaol/llama-on-aws-sagemaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

End-to-End Deployment of LLaMA-3 Model on SageMaker

This Notebook demonstrates the complete process of deploying the LLaMA-3 language model as an end-to-end inference service using SageMaker. The main steps include:

  1. Installing the necessary libraries and downloading the pre-trained LLaMA-3 model from Hugging Face to the local environment.
  2. Uploading the downloaded model to Amazon S3 for subsequent deployment.
  3. Preparing the required configurations for model deployment, including the inference entry point script, container image, and service parameters.
  4. Utilizing SageMaker's managed inference functionality to create a model and an endpoint for providing online inference services.
  5. Testing the deployed model's inference service using a text generation task as an example, showcasing how to invoke the deployed endpoint.

Environment Requirements

  • This Notebook needs to be run on a SageMaker Notebook instance.
  • The Notebook instance should have access to S3 for storing the model.
  • A Hugging Face API Token is required to download the model.

Usage Instructions

  1. Open this Notebook on a SageMaker Notebook instance.
  2. Execute the steps in the Notebook sequentially, making sure to replace the S3 bucket paths with your own.
  3. During the model deployment process, you can adjust the type and number of compute instances according to your needs.
  4. The model testing section demonstrates two invocation methods: with and without streaming. You can choose based on your requirements.
  5. The last code cell provides sample code for resource cleanup. Execute it with caution.

By following this Notebook, you will learn how to quickly deploy large language models using SageMaker's managed inference functionality and provide flexible online inference services. In real-world applications, you can further optimize the model's performance and throughput to meet business demands.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published