Skip to content

Deploying a New Model to NDIF

Michael Ripa edited this page Aug 29, 2024 · 1 revision

Deploying a New Model to NDIF

1. Clone NDIF and switch to dev branch

git clone https://github.com/ndif-team/ndif.git && cd ndif && git checkout dev

2. Install Miniconda

Download and install Miniconda by following the instructions on the official Miniconda page for Linux.

3. Create and active the service environment

conda env create -f ../ndif/services/ray_head/environment.yml -n prod
conda activate prod

4. Install NNSight

To ensure that the correct version of NNSight is present, do the following:

pip uninstall nnsight && git clone https://github.com/ndif-team/nnsight.git
pip install -e nnsight
cd nnsight && git checkout 0.3

5. Install Ray Serve

In order for Ray to work, you need all nodes to have the same version of Ray installed. Here is what we have been using:

pip install ray[serve]==2.34

6. Create a HF Cache

Choose a location for your huggingface cache (if you don't already have one)

touch .hf_config

7. Create env.sh Script

Create a script named env.sh in your working directory with the following content (make sure to modify your environment variables appropriately):

#! /bin/bash

huggingface-cli login --token hf-token

export PYTHONPATH=/path/to/ndif/services/ray_worker
export HF_HOME=/path/to/.hf_config
export RAY_ADDRESS=head-node-ip:6379
export NCCL_IB_DISABLE=1
  • Replace hf-token with your actual huggingface token.
  • Replace /path/to/.hf_config with the actual path to the Hugging Face cache you previously made.
  • Replace head-node-ip with the IP address of your Ray head node.

8. Source env.sh

Source the environment variables from env.sh:

source env.sh

9. Download the model weights

The easiest way to do this is to create a Python script which uses NNSight to load a model:

import nnsight

model = nnsight.LanguageModel('{model-checkpoint}' , dispatch=True)

with model.trace('ayy') as tracer:
  out = tracer.output.save()

Save the following to download.py and run python3 download.py. You can stop the script once the model weights are downloaded. Make sure to replace {model-checkpoint} with the actual huggingface checkpoint.

10. Create start.sh script

Create a script named start.sh in your working directory with the following content:

#!/bin/bash

HOSTNAME=$(hostname)

source env.sh

resources=`python -m src.ray.resources --name $HOSTNAME`

ray start --resources "$resources" --address $RAY_ADDRESS --block

11. Run the script

This will start the model deployment. Using tmux ensures that the deployment continues running in the background, even if your terminal session disconnects.

tmux
conda activate prod
bash start.sh