Rectik - Recommendation System for TikTok

Rectik is a recommendation system project aimed at building, training, and deploying a multi-stage recommendation system for TikTok-style short videos. This project utilizes the NVIDIA Merlin ecosystem for efficient data processing, feature extraction, and model deployment, while leveraging Metaflow for workflow management.

Project Structure

Rectik’s workflow is divided into three main flows:

Data Flow: Handles data preprocessing, feature extraction, and transformations.
Train Flow: Defines and trains the recommendation models, including retrieval and reranking stages.
Serve Flow: Combines models from the train flow to create an ensemble for deployment.

1. Data Flow

The Data Flow pipeline is responsible for:

Data Preprocessing: Preparing raw data for modeling, including handling missing values, feature engineering, and data transformations.
Feature Extraction: Extracting video features to serve as input for downstream models.
Data Splitting: Splitting the data into training and testing sets. These steps ensure that the data is compatible with NVIDIA Merlin models.

2. Train Flow

The Train Flow pipeline defines and trains models for the recommendation system using a multi-stage approach:

Retrieval (Two-Tower Model): This model retrieves a large set of candidate videos, narrowing down potential recommendations to a manageable number.
Reranking (DLRM): This model ranks the retrieved candidates to find the most relevant videos. Tools Used:
NVIDIA Merlin: For model building and training.
FAISS: For vector similarity search, used to speed up the retrieval of candidates.
Feast: As a feature store for managing and serving user and item features.

3. Serve Flow

The Serve Flow pipeline handles the following tasks:

Ensemble Creation: Merging the retrieval and reranking models into a single ensemble.
Deployment Setup: Preparing the model repository with metadata, workflows, and checkpoints for deployment on Triton Server for efficient inference.

Technology Stack

Data Processing:
- NVIDIA Merlin Ecosystem for end-to-end recommendation workflows.
- Pyarrow, Cudf, Dask for fast data manipulation.
- DuckDB for SQL-based operations on Parquet files.
Vector Store: FAISS
Feature Store: Feast
Model Deployment: Triton Server Inference for efficient inference of model ensembles.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dbt_rectik		dbt_rectik
image		image
ingestion		ingestion
postgres		postgres
recommender		recommender
rectik		rectik
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rectik - Recommendation System for TikTok

Project Structure

1. Data Flow

2. Train Flow

3. Serve Flow

Technology Stack

About

Releases

Packages

Languages

License

LongBaoCoder2/rectik

Folders and files

Latest commit

History

Repository files navigation

Rectik - Recommendation System for TikTok

Project Structure

1. Data Flow

2. Train Flow

3. Serve Flow

Technology Stack

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages