Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

245 user story add digital avatar use case (#246) #82

Merged
merged 1 commit into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions usecases/ai/digital-avatar/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
__pycache__
.env

ffmpeg*/
checkpoints
cache/
backend/musetalk/models
backend/musetalk/data/avatars
backend/wav2lip/wav2lip/results
backend/wav2lip/wav2lip/temp
weights/*
backend/liveportrait/templates
118 changes: 118 additions & 0 deletions usecases/ai/digital-avatar/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Digital Avatar

A digital avatar that utilizes Image to Video, Text To Speech, Speech To Text, and LLM to create an interactive avatar.

![Demo](./docs/demo.gif)


## Table of Contents
- [Architecture Diagram](#requirements)
- [Requirements](#requirements)
- [Minimum](#minimum)
- [Recommended](#recommended)
- [Application Ports](#application-ports)
- [Setup](#setup)
- [Prerequisite](#prerequisite)
- [Setup ENV](#setup-env)
- [Build Docker Container](#build-docker-container)
- [Start Docker Container](#start-docker-container)
- [Access the App](#access-the-app)
- [FAQ](#faq)

## Architecture DIagram
![Archictecture Diagram](./docs/architecture.png)

## Requirements

### Minimum
- CPU: 13th generations of Intel Core i5 and above
- GPU: Intel® Arc™ A770 graphics (16GB)
- RAM: 32GB
- DISK: 128GB

## Application Ports
Please ensure that you have these ports available before running the applications.

| Apps | Port |
|--------------|------|
| Lipsync | 8011 |
| LivePortrait | 8012 |
| TTS | 8013 |
| STT | 8014 |
| OLLAMA | 8015 |
| Frontend | 80 |

## Setup

### Prerequisite
1. **OS**: Ubuntu (Validated on 22.04)

***Note***: If you are using different Ubuntu version, please [update the RENDER_GROUP_ID](#1-how-to-check-render-group-id)

1. **Docker and Docker Compose**: Ensure Docker and Docker Compose are installed. Refer to [Docker installation guide](https://docs.docker.com/engine/install/).
1. **Intel GPU Drivers**:
1. Refer to [here](../../../README.md#gpu) to install Intel GPU Drivers
1. **Download Wav2Lip Model**: Download the [Wav2Lip model](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW) and place the file in the `weights` folder.
1. **Create Avatar**:
1. Place an `image.png` file containing an image of a person (preferably showing at least the upper half of the body) in the assets folder.
2. Place an `idle.mp4` file of a person with some movement such as eye blinking (to be used as a reference) in the assets folder.

### Setup ENV
1. Create a `.env` file and copy the contents from `.env.template`:
```bash
cp .env.template .env
```
2. Modify the `LLM_MODEL` in the `.env` file. Refer to [Ollama library](https://ollama.com/library) for available models. (Default is `QWEN2.5`).

### Build Docker Container
```bash
docker compose build
```

### Start Docker container
```bash
docker compose up -d
```

### Access the App
- Navigate to http://localhost

## Notes
### Device Workload Configurations
You can offload model inference to specific device by modifying the environment variable setting in the docker-compose.yml file.

| Workload | Environment Variable |Supported Device |
|----------------------|----------------------|-------------------------|
| LLM | - | GPU |
| STT - Encoded Device | STT_ENCODED_DEVICE | CPU,GPU,NPU |
| STT - Decided Device | STT_DECODED_DEVICE | CPU,GPU |
| TTS | TTS_DEVICE | CPU |
| Lipsync (Wav2lip) | DEVICE | CPU, GPU |

Example Configuration:

* To offload the STT encoded workload to `NPU`, you can use the following configuration.

```
wav2lip:
...
environment:
...
DEVICE=CPU
...
```

## FAQ
### 1. Update Render Group ID
1. Ensure the [Intel GPU driver](#prerequisite) is installed.
2. Check the group ID from `/etc/group`:
```bash
grep render /etc/group
```
3. The output will be something like:
```
render:x:110:user
```
4. The group ID is the number in the third field (e.g., `110` in the example above).
5. Ensure the `RENDER_GROUP_ID` in the [docker-compose.yml](./docker-compose.yml) file matches the render group ID.

77 changes: 77 additions & 0 deletions usecases/ai/digital-avatar/backend/liveportrait/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
FROM debian:12-slim

ARG DEBIAN_FRONTEND=noninteractive
ARG RENDER_GROUP_ID
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install --no-install-recommends -y \
sudo \
wget \
ca-certificates \
ffmpeg \
libsm6 \
libxext6 \
curl \
git \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
llvm \
libncursesw5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
libffi-dev \
liblzma-dev \
&& addgroup --system intel --gid 1000 \
&& adduser --system --ingroup intel --uid 1000 --home /home/intel intel \
&& echo "intel ALL=(ALL:ALL) NOPASSWD:ALL" > /etc/sudoers.d/intel \
&& groupadd -g ${RENDER_GROUP_ID} render \
&& usermod -aG render intel \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /usr/src \
&& chown -R intel:intel /usr/src

# Intel GPU Driver
RUN apt-get update && apt-get install -y gnupg

RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
tee /etc/apt/sources.list.d/intel-gpu-jammy.list && \
apt update && \
apt-get install -y --no-install-recommends libze1 intel-level-zero-gpu intel-opencl-icd clinfo libze-dev intel-ocloc

USER intel
WORKDIR /usr/src/app

# Set environment variables for pyenv
ENV PYENV_ROOT="/usr/src/app/.pyenv"
ENV PATH="$PYENV_ROOT/bin:$PYENV_ROOT/shims:$PATH"

# Install pyenv
RUN curl https://pyenv.run | bash \
&& echo 'export PYENV_ROOT="$PYENV_ROOT"' >> ~/.bashrc \
&& echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc \
&& echo 'eval "$(pyenv init --path)"' >> ~/.bashrc \
&& echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
&& . ~/.bashrc \
&& pyenv install 3.10.15 \
&& pyenv global 3.10.15

RUN python3 -m pip install --upgrade pip \
&& python3 -m pip install virtualenv

RUN python3 -m venv /usr/src/.venv
ENV PATH="/usr/src/.venv/bin:$PATH"

COPY --chown=intel ./backend/liveportrait .
RUN python3 -m pip install -r requirements.txt \
&& huggingface-cli download KwaiVGI/LivePortrait --local-dir liveportrait/pretrained_weights --exclude "*.git*" "README.md" "docs"

HEALTHCHECK --interval=30s --timeout=180s --start-period=60s --retries=3 \
CMD sh -c 'PORT=${SERVER_PORT:-8012} && wget --no-verbose -O /dev/null --tries=1 http://localhost:$PORT/healthcheck || exit 1'
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Byte-compiled / optimized / DLL files
__pycache__/
**/__pycache__/
*.py[cod]
**/*.py[cod]
*$py.class

# Model weights
**/*.pth
**/*.onnx

pretrained_weights/*.md
pretrained_weights/docs
pretrained_weights/liveportrait
pretrained_weights/liveportrait_animals

# Ipython notebook
*.ipynb

# Temporary files or benchmark resources
animations/*
tmp/*
.vscode/launch.json
**/*.DS_Store
gradio_temp/**

# Windows dependencies
ffmpeg/
LivePortrait_env/

# XPose build files
src/utils/dependencies/XPose/models/UniPose/ops/build
src/utils/dependencies/XPose/models/UniPose/ops/dist
src/utils/dependencies/XPose/models/UniPose/ops/MultiScaleDeformableAttention.egg-info
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
MIT License

Copyright (c) 2024 Kuaishou Visual Generation and Interaction Center

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

---

The code of InsightFace is released under the MIT License.
The models of InsightFace are for non-commercial research purposes only.

If you want to use the LivePortrait project for commercial purposes, you
should remove and replace InsightFace’s detection models to fully comply with
the MIT license.
Loading