E-commerceML

Machine Learning model development for a transport company, the objective is to predict whether an order will arrive on time or not.

Problem Description

We are part of a logistics company that works for an important E-Commerce portal, and our Team Leader gives us the task of implementing a model that allows us to predict whether a shipment will arrive on time or not, according to the information contained in the dataset.

About the dataset

The main dataset is a version of Kaggle E-Commerce Shipping Data. This dataset contains the following information:

ID: ID Number of Customers.
Warehouse block: The Company have big Warehouse which is divided in to block such as A,B,C,D,E.
Mode of shipment:The Company Ships the products in multiple way such as Ship, Flight and Road.
Customer care calls: The number of calls made from enquiry for enquiry of the shipment.
Customer rating: The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).
Cost of the product: Cost of the Product in US Dollars.
Prior purchases: The Number of Prior Purchase.
Product importance: The company has categorized the product in the various parameter such as low, medium, high.
Gender: Male and Female.
Discount offered: Discount offered on that specific product.
Weight in gms: It is the weight in grams.
Reached on time: It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time.

Metrics to be evaluated

Recall

Recall of the Confusion Matrix will be used as a method for evaluating model performance. Our main interest is to find those shipments that will not arrive on time. The recall will answer the question: What percentage of shipments that do not arrive on time are we able to identify?

$$ Recall=\frac{TP}{TP+FN}$$

where $TP$ the true positives and $FN$ the false negatives.

Accuracy

Accuracy is a metric also based on the confusion matrix. In this case we will take this metric to evaluate the classification performance for both class 1 and class 0 in our target variable. Note that in this exercise the primary class will be class 1, i.e. those shipments that do not arrive on time.

$$ Accuracy=\frac{TP + TN}{TP+ TN + FN + FP}$$

where $TP$ the true positives, $TN$ true negatives, $FN$ false negatives, $FP$ false positives.

General Steps

Exploratory Data Analysis (EDA)
Data Preprocessing
First Modeling Batch (Working with raw data)
Second Modeling Batch (Aplying One hot Encoding)
Third Modeling Batch (Evaluating StandardScaler)
Fourth Modeling Batch (Evaluating Dimension Reduction using PCA)
Final model selection and searching for best hyperparameters with GridSearchCV
Conclusions

For more deep information please don't hesitate to open the main.ipynb.

Documentation to highlight

Contact

Greetings, Jean Paul Fabra Ruiz: jeanfabra11@gmail.com

LinkedIn: https://www.linkedin.com/in/jeanfabra/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Datasets		Datasets
.gitignore		.gitignore
Jeanfabra.csv		Jeanfabra.csv
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-commerceML

Problem Description

About the dataset

Metrics to be evaluated

Recall

Accuracy

General Steps

Documentation to highlight

Contact

About

Releases

Packages

Languages

Jeanfabra/E-commerceML

Folders and files

Latest commit

History

Repository files navigation

E-commerceML

Problem Description

About the dataset

Metrics to be evaluated

Recall

Accuracy

General Steps

Documentation to highlight

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages