Andrew Ng and Giskard team has recently released great course called "Red Teaming LLM Applications" on DeepLearning.AI platform. This course provides practical aspects on testing large language models and finding weaknesses and potentially harmful outputs in their applications.
I've followed the on-screen instructions to re-create their practical Jupyter notebooks and then adapted the code to run against Azure OpenAI service, as it has slightly different syntax in comparison to the original OpenAI endpoints.
Additionally, various references to llama-index classes were updated, to make the course's helper functions compatible with the latest llama-index v0.10.x.
- Configuring solution environment
- Lesson 1: Overview of LLM Vulnerabilities
- Lesson 2: Red Teaming LLMs
- Lesson 3: Red Teaming at Scale
- Lesson 4: Red Teaming LLMs with LLMs
- Lesson 5: A Full Red Teaming Assessment
- To use Azure OpenAI backend, assign the API endpoint name, key and version, along with the Azure OpenAI deployment names of GPT and Embedding models to AZURE_OPENAI_API_BASE, AZURE_OPENAI_API_KEY, AZURE_OPENAI_API_VERSION, AZURE_OPENAI_API_DEPLOY (for GPT) and AZURE_OPENAI_API_DEPLOY_EMBED (for Embedding) environment variables respectively.
- Install the required Python packages, by using the pip command and the provided requirements.txt file.
pip install -r requirements.txt
First lesson provides an overview of LLM vulnerabilities. It describes hypothetical scenarios, causes of observed behaviour and potential impact. Four main categories of described LLM vulnerabilities are:
- Bias and stereotypes;
- Sensitive information disclosure;
- Service disruption;
- Hallucinations.
Second lesson focuses on the aspects of LLM Red Teaming. It explores different techniques to bypass the model's safeguards:
- Exploiting text completion;
- Using biased prompts;
- Direct prompt injection;
- Grey box prompt attacks;
- Advanced technique: prompt probing.
Third lesson is about automation approaches for the Prompt Injection attacks;
- Manually defined injection techniques;
- Using library of prompts;
- Giskard's LLM scan.
Fourth lesson is about the use of LLM to automate the Red Teaming process. Here you can find how to use custom scripting to automate generation of adversarial inputs and evaluation of the app's outputs. Then it's shown how the same process can be automated by using Giskard's Python library.
Fifth lesson provides an example of a full Red Teaming assessment. It consists of 2 rounds:
- Round one is about more general probing of the company bot, to search for any signs of vulnerabilities in various categories, e.g. toxicity and offensive content, off-topic content, excessive agency, sensitive information disclosure, etc. You can use either custom prompts or automate prompts generation with Giskard's Python library;
- Round two is about exploiting specific functionality, e.g. prompt injection, to achieve a malicious goal. In this fictitious scenario, we persuade the bot to refund the order, even if it's not eligible any more.