Skip to content

tiktok recommendation system - project aiming to replicate the algorithm behind tiktok's recommender.

License

Notifications You must be signed in to change notification settings

LongBaoCoder2/rectik

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rectik - Recommendation System for TikTok

Rectik is a recommendation system project aimed at building, training, and deploying a multi-stage recommendation system for TikTok-style short videos. This project utilizes the NVIDIA Merlin ecosystem for efficient data processing, feature extraction, and model deployment, while leveraging Metaflow for workflow management.


Pipeline

Project Structure

Rectik’s workflow is divided into three main flows:

  1. Data Flow: Handles data preprocessing, feature extraction, and transformations.
  2. Train Flow: Defines and trains the recommendation models, including retrieval and reranking stages.
  3. Serve Flow: Combines models from the train flow to create an ensemble for deployment.

1. Data Flow

The Data Flow pipeline is responsible for:

  • Data Preprocessing: Preparing raw data for modeling, including handling missing values, feature engineering, and data transformations.
  • Feature Extraction: Extracting video features to serve as input for downstream models.
  • Data Splitting: Splitting the data into training and testing sets. These steps ensure that the data is compatible with NVIDIA Merlin models.

2. Train Flow

The Train Flow pipeline defines and trains models for the recommendation system using a multi-stage approach:

  • Retrieval (Two-Tower Model): This model retrieves a large set of candidate videos, narrowing down potential recommendations to a manageable number.
  • Reranking (DLRM): This model ranks the retrieved candidates to find the most relevant videos. Tools Used:
  • NVIDIA Merlin: For model building and training.
  • FAISS: For vector similarity search, used to speed up the retrieval of candidates.
  • Feast: As a feature store for managing and serving user and item features.

3. Serve Flow

The Serve Flow pipeline handles the following tasks:

  • Ensemble Creation: Merging the retrieval and reranking models into a single ensemble.
  • Deployment Setup: Preparing the model repository with metadata, workflows, and checkpoints for deployment on Triton Server for efficient inference.

Technology Stack

  • Data Processing:
    • NVIDIA Merlin Ecosystem for end-to-end recommendation workflows.
    • Pyarrow, Cudf, Dask for fast data manipulation.
    • DuckDB for SQL-based operations on Parquet files.
  • Vector Store: FAISS
  • Feature Store: Feast
  • Model Deployment: Triton Server Inference for efficient inference of model ensembles.

About

tiktok recommendation system - project aiming to replicate the algorithm behind tiktok's recommender.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages