Skip to content

Commit

Permalink
Add instructions of modifying reranking docker image for NVGPU (#1133)
Browse files Browse the repository at this point in the history
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
wangkl2 and pre-commit-ci[bot] authored Nov 18, 2024
1 parent 7e62175 commit 2587179
Showing 1 changed file with 51 additions and 4 deletions.
55 changes: 51 additions & 4 deletions ChatQnA/docker_compose/nvidia/gpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ This document outlines the deployment process for a ChatQnA application utilizin
Quick Start Deployment Steps:

1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
2. Modify the TEI Docker Image for Reranking
3. Run Docker Compose.
4. Consume the ChatQnA Service.

## Quick Start: 1.Setup Environment Variable

Expand Down Expand Up @@ -35,7 +36,30 @@ To set up environment variables for deploying ChatQnA services, follow these ste
source ./set_env.sh
```

## Quick Start: 2.Run Docker Compose
## Quick Start: 2.Modify the TEI Docker Image for Reranking

> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.

| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |

For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:

```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```

## Quick Start: 3.Run Docker Compose

```bash
docker compose up -d
Expand All @@ -56,7 +80,7 @@ In following cases, you could build docker image from source by yourself.

Please refer to 'Build Docker Images' in below.

## QuickStart: 3.Consume the ChatQnA Service
## QuickStart: 4.Consume the ChatQnA Service

```bash
curl http://${host_ip}:8888/v1/chatqna \
Expand Down Expand Up @@ -176,6 +200,29 @@ Change the `xxx_MODEL_ID` below for your needs.
source ./set_env.sh
```

### Modify the TEI Docker Image for Reranking

> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.

| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |

For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:

```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```

### Start all the services Docker Containers

```bash
Expand Down

0 comments on commit 2587179

Please sign in to comment.