## MLOps Lifecycle:
1. Problem Definition and Requirement Gathering
2. EDA
3. Feature Engineering
4. Feature Selection
5. Model Creation
6. Model Hyper-parameter Tunning
7. Model Deployment
8. Retraining Approaches
- Git clone the repository and Define template of the project
touch template.py
python3 template.py
-
define setup.py scripts (The setup.py is a module used to build and distribute Python packages. It typically contains information about the package)
-
Create environment and install dependencies
conda create -n mlops-env python=3.9 -y
conda activate mlops-env
pip install -r requirements.txt
-
define custom exception and define logger (The Logging is a means of tracking events that happen when some software runs)
-
define utils (The utils.py makes it easy to execute common tasks in Python scripts)
-
Data Ingesstion Section
- constants added
- params.yaml defined
- 01_data_ingeston.ipynb created
- stage_01_data_ingeston.py created
- data downloaded
- DVC Section
- run dvc init
- define data_ingeston stage in dvc.yaml
- run dvc repro
- EDA
- load the dataset
- statistical checking
- checking number of unique values for each columns
- check data type
- check duplicate values
- check null values
- check balance of the dataset
- check outliers
- visualization
- checking correlation
- Data Preprocessing Section
- params.yaml defined
- 02_data_preprocessing.ipynb added
- stage_02_data_preprocessing.py created
- data_preprocessing stage added to dvc.yaml
- Model Training Section
- define params.yaml
- created 03_model_training.ipynb
- mlflow added to the project
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./artifacts \
--host 0.0.0.0 -p 1234
----------------------------------------------------------------
mlflow server --backend-store-uri sqlite:///mlflow.db --host 0.0.0.0 -p 1234 --default-artifact-root ./artifacts
- stage_03_training_and_evaluation.py created
- model_train_evaluation stage added to dvc pipeline
- Log Prediction Model
- params.yaml defined
- 04_log_production_model.ipynb created
- model_regiseter_name added to stage_03_training_and_evaluation.py
- stage_04_log_production_model.py created
- logg_production_model stage added to dvc
- model with best accuracy loaded in mlflow and saved
- Flask App
- add the required html and css to the project
- define app.py
- run the app
- Docker And CICD
- define the docker file
- create .github/workflows/docker_cicd.yaml and define it
- DOCKER_USERNAME=?
- DOCKER_PASSWORD=?
- push the changes
- AWS Section
Note: we will deploy AWS-CICD using Github-Actions
#with specific access
1. ECR: Elastic Container registry to save your docker image in aws
2. EC2 access : It is virtual machine
#Description: About the deployment
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
#Policy for IAM:
1. AmazonEC2ContainerRegistryFullAccess
2. AmazonEC2FullAccess
- Save the URI: 060145207853.dkr.ecr.ap-south-1.amazonaws.com/stroke_predictor
#optinal
sudo apt-get update
sudo apt-get upgrade -y
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting-->actions-->runner-->new self hosted runner--> choose os--> copy each command and run it on EC2 Instance Connect
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = ap-south-1
AWS_ECR_LOGIN_URI = demo>> 060145207853.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY_NAME = stroke_predictor