Using AWS Sagemaker to train a pretrained model Resnet50 to perform image classification by using the Sagemaker profiling, debugger, hyperparameter tuning The following tasks are performed.
- Using a Resnet50 pretrained model from pytorch library(https://pytorch.org/vision/master/generated/torchvision.models.resnet50.html)
- Fine-tune the model with hyperparameter tuning
- Using the Sagemaker profiling, debugger
- Deploy the model and perform inference
Enter AWS through the gateway in the course and open SageMaker Studio. Download the starter files. Download/Make the dataset available. You can use this link to get the starter files
Udacity's Dog Breed Classification Data set is used. The dataset can be downloaded here.
Upload the data to an S3 bucket through the AWS Gateway so that SageMaker has access to the data.
-
train_and_deploy.ipynb
it contains all step to upload data to s3, fine tunning, get best model hyperparameters, train the bestperformance model and test it , and make sagemaker profiling and debugger, and finally deploy the model and make an inference. -
hpo.py
This is the python script using to train and test all models in tuning hyperparameters step. -
train.py
This is the python script using to train the best performance model and test it. -
inference
This script we use it to deploy the model on AWS and make prediction.
- The Resnet50 used to learn the data because it is trained on a lot of data and it's concolution can get the general feature
- One fully connected layer is used on top of resnet50 to predict 133 number of dog breed
- Batch- size, Epochs, and Learning rate are used to search for the best model in the tunning hyperparameters step
The Graphical representation of the Cross Entropy Loss.
The profiler report can be found here.
- Model was deployed to a "ml.m5.large" instance type and "endpoint_inference.py" script is used to setup and deploy our working endpoint.
- For testing purposes ,one test images are stored in the "images" folder.
- image are fed to the endpoint for inference.
- for security manner, we add only the specific role to perform this task not Full Access to Sagemaker to limit the access for lambda and limit any security vulnerabilities.
- For concurrency I choose the Provisioned concurrency to make instances always on without requiring a wait for start-up time and that achieve low latency in high traffic and I choose to make 3 instance for concurrency and that cost 4.19$ in addition to pricing for duration and requests.
- For Auto-Scaling to deal with high-requested gets from Lambda Function I use 3 maximum instance count for auto scaling and in Scale in cool down I will configure to start a new instance after 10 second if I get more than 15 requests in the same time and if I get less than 15 request in the same time for 1 minutes I close the additional instance that uses for deal with high throughput