This solution provides ready-to-use code so you can start experimenting with a variety of Large Language Models and Multimodal Language Models, settings and prompts in your own AWS account.
Supported model providers:
- Amazon Bedrock
- Amazon SageMaker self-hosted models from Foundation, Jumpstart and HuggingFace.
- Third-party providers via API such as Anthropic, Cohere, AI21 Labs, OpenAI, etc. See available langchain integrations for a comprehensive list.
Deploy IDEFICS models on Amazon SageMaker and see how the chatbot can answer questions about images, describe visual content, generate text grounded in multiple images.
Currently, the following multimodal models are supported:
- IDEFICS 9b Instruct
- Requires
ml.g5.12xlarge
instance.
- Requires
- IDEFICS 80b Instruct
- Requires
ml.g5.48xlarge
instance.
- Requires
To have the right instance types and how to request them, read Amazon SageMaker requirements
NOTE: Make sure to review IDEFICS models license sections.
To deploy a multimodal model, follow the deploy instructions and select one of the supported models (press Space to select/deselect) from the magic-create CLI step and deploy as instructed in the above section.
⚠️ NOTE⚠️ Amazon SageMaker are billed by the hour. Be aware of not letting this model run unused to avoid unnecessary costs.
Send the same query to 2 to 4 separate models at once and see how each one responds based on its own learned history, context and access to the same powerful document retriever, so all requests can pull from the same up-to-date knowledge.
A workspace is a logical namespace where you can upload files for indexing and storage in one of the vector databases. You can select the embeddings model and text-splitting configuration of your choice.
The solution comes with several debugging tools to help you debug RAG scenarios:
- Run RAG queries without chatbot and analyse results, scores, etc.
- Test different embeddings models directly in the UI
- Test cross encoders and analyse distances from different functions between sentences.
The repository includes a CDK construct to deploy a full-fledged UI built with React to interact with the deployed LLMs/MLMs as chatbots. Hosted on Amazon S3 and distributed with Amazon CloudFront.
Protected with Amazon Cognito Authentication to help you interact and experiment with multiple LLMs/MLMs, multiple RAG engines, conversational history support and document upload/progress.
The interface layer between the UI and backend is built with API Gateway REST API for management requests and Amazon API Gateway WebSocket APIs for chatbot messages and responses.
Design system provided by AWS Cloudscape Design System.
Before you begin using the solution, there are certain precautions you must take into account:
-
Cost Management with self-hosted models on SageMaker: Be mindful of the costs associated with AWS resources, especially with SageMaker models billed by the hour. While the sample is designed to be cost-effective, leaving serverful resources running for extended periods or deploying numerous LLMs/MLMs can quickly lead to increased costs.
-
Licensing obligations: If you choose to use any datasets or models alongside the provided samples, ensure you check the LLM code and comply with all licensing obligations attached to them.
-
This is a sample: the code provided in this repository shouldn't be used for production workloads without further reviews and adaptation.
Instance type quota increase
If you are looking to self-host models on Amazon SageMaker, you'll likely need to request an increase in service quota for specific SageMaker instance types, such as the ml.g5
instance type. This will give access to the latest generation of GPU/Multi-GPU instance types. You can do this from the AWS console
Base Models Access
If you are looking to interact with models from Amazon Bedrock, you need to request access to the base models in one of the regions where Amazon Bedrock is available. Make sure to read and accept models' end-user license agreements or EULA.
Note:
- You can deploy the solution to a different region from where you requested Base Model access.
- While the Base Model access approval is instant, it might take several minutes to get access and see the list of models in the UI.
You can also interact with external providers via their API, such as AI21 Labs, Cohere, OpenAI, etc.
The provider must be supported in the Model Interface, see available langchain integrations for a comprehensive list of providers.
Usually, an API_KEY
is required to integrate with 3P models. To do so, the Model Interface deployes a Secrets in AWS Secrets Manager, intially with an empty JSON {}
, where you can add your API KEYS for one or more providers.
These keys will be injected at runtime into the Lambda function Environment Variables; they won't be visible in the AWS Lambda Console.
For example, if you wish to be able to interact with AI21 Labs., OpenAI's and Cohere endpoints:
- Open the Model Interface Keys Secret in Secrets Manager. You can find the secret name in the stack output, too.
- Update the Secrets by adding a key to the JSON
{
"AI21_API_KEY": "xxxxx",
"OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxx",
"COHERE_API_KEY": "xxxxx",
}
N.B: In case of no keys needs, the secret value must be an empty JSON {}
, NOT an empty string ''
.
make sure that the environment variable matches what is expected by the framework in use, like Langchain (see available langchain integrations.
We recommend deploying with AWS Cloud9. If you'd like to use Cloud9 to deploy the solution, you will need the following before proceeding:
- select at least
m5.large
as Instance type. - use
Ubuntu Server 22.04 LTS
as the platform.
If you'd like to use GitHub Codespaces to deploy the solution, you will need the following before proceeding:
- An AWS account
- An IAM User with:
AdministratorAccess
policy granted to your user (for production, we recommend restricting access as needed)- Take note of
Access key
andSecret access key
.
To get started, click on the button below.
Once in the Codespaces terminal, set up the AWS Credentials by running
aws configure
AWS Access Key ID [None]: <the access key from the IAM user generated above>
AWS Secret Access Key [None]: <the secret access key from the IAM user generated above>
Default region name: <the region you plan to deploy the solution to>
Default output format: json
You are all set for deployment; you can now jump to .3 of the deployment section below.
If you have decided not to use AWS Cloud9 or GitHub Codespaces, verify that your environment satisfies the following prerequisites:
You have:
- An AWS account
AdministratorAccess
policy granted to your AWS account (for production, we recommend restricting access as needed)- Both console and programmatic access
- NodeJS 16 or 18 installed
- If you are using
nvm
you can run the following before proceeding -
nvm install 16 && nvm use 16 or nvm install 18 && nvm use 18
- If you are using
- AWS CLI installed and configured to use with your AWS account
- Typescript 3.8+ installed
- AWS CDK CLI installed
- Docker installed
- N.B.
buildx
is also required. For Windows and macOSbuildx
is included in Docker Desktop
- N.B.
- Python 3+ installed
- Clone the repository
git clone https://github.com/aws-samples/aws-genai-llm-chatbot
- Move into the cloned repository
cd aws-genai-llm-chatbot
If you use Cloud9, increase the instance's EBS volume to at least 100GB. To do this, run the following command from the Cloud9 terminal:
./scripts/cloud9-resize.sh
See the documentation for more details on environment resize.
3. Install the project dependencies and build the project by running this command
npm install && npm run build
- Once done, run the magic-create CLI to help you set up the solution with the features you care most:
npm run create
You'll be prompted to configure the different aspects of the solution, such as:
- The LLMs or MLMs to enable (we support all models provided by Bedrock along with SageMaker hosted Idefics, FalconLite, Mistral and more to come)
- Setup of the RAG system: engine selection (i.e. Aurora w/ pgvector, OpenSearch, Kendra..) embeddings selection and more to come.
When done, answer Y
to create a new configuration.
Your configuration is now stored under bin/config.json
. You can re-run the magic-create command as needed to update your config.json
- (Optional) Bootstrap AWS CDK on the target account and region
Note: This is required if you have never used AWS CDK on this account and region combination. (More information on CDK bootstrapping).
npx cdk bootstrap aws://{targetAccountId}/{targetRegion}
You can now deploy by running:
npx cdk deploy
Note: This step duration can vary greatly, depending on the Constructs you are deploying.
You can view the progress of your CDK deployment in the CloudFormation console in the selected region.
- Once deployed, take note of the
User Interface
,User Pool
and, if you want to interact with 3P models providers, theSecret
that will, eventually, hold the variousAPI_KEYS
should you want to experiment with 3P providers.
...
Outputs:
GenAIChatBotStack.UserInterfaceUserInterfaceDomanNameXXXXXXXX = dxxxxxxxxxxxxx.cloudfront.net
GenAIChatBotStack.AuthenticationUserPoolLinkXXXXX = https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
GenAIChatBotStack.ApiKeysSecretNameXXXX = ApiKeysSecretName-xxxxxx
...
-
Open the generated Cognito User Pool Link from outputs above i.e.
https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
-
Add a user that will be used to log into the web interface.
-
Open the
User Interface
Url for the outputs above, i.e.dxxxxxxxxxxxxx.cloudfront.net
-
Login with the user created in .8; you will be asked to change the password.
See instructions in the README file of the lib/user-interface/react-app
folder.
If you're using Kendra with an index in a language other than English, you will need to make some code modifications.
You'll need to modify the filters in the file lib/shared/layers/python-sdk/python/genai_core/kendra/query.py.
Example for french :
if kendra_index_external or kendra_use_all_data:
result = kendra.retrieve(
IndexId=kendra_index_id,
QueryText=query,
PageSize=limit,
PageNumber=1,
AttributeFilter={'AndAllFilters': [{"EqualsTo": {"Key": "_language_code","Value": {"StringValue": "fr"}}}]}
)
else:
result = kendra.retrieve(
IndexId=kendra_index_id,
QueryText=query,
PageSize=limit,
PageNumber=1,
AttributeFilter={'AndAllFilters':
[
{"EqualsTo": {"Key": "_language_code","Value": {"StringValue": "fr"}}},
{"EqualsTo": {"Key": "workspace_id","Value": {"StringValue": workspace_id}}}
]
}
)
Please note: If these adjustments are made post-deployment, it's essential to rebuild and redeploy. If done prior to deployment, you can proceed with the walkthrough as usual.
npm install && npm run build
npx cdk deploy
You can remove the stacks and all the associated resources created in your AWS account by running the following command:
npx cdk destroy
Note: Depending on which resources have been deployed. Destroying the stack might take a while, up to 45m. If the deletion fails multiple times, please manually delete the remaining stack's ENIs; you can filter ENIs by VPC/Subnet/etc using the search bar here in the AWS console) and re-attempt a stack deletion.
This repository comes with several reusable CDK constructs. Giving you the freedom to decide what to deploy and what not.
Here's an overview:
This sample was made possible thanks to the following libraries:
- langchain from LangChain AI
- unstructured from Unstructured-IO
- pgvector from Andrew Kane
This library is licensed under the MIT-0 License. See the LICENSE file.
- Changelog of the project.
- License of the project.
- Code of Conduct of the project.
- CONTRIBUTING for more information.