Data Science Projects

11. Sparkify - Detection of User Churn using PySpark

Forecasting churn or attrition rates presents a complex and prevalent challenge that data scientists and analysts frequently face in customer-oriented enterprises. The capacity to adeptly handle extensive datasets using Spark is among the most sought-after competencies in the data domain. Also to convey the findings of the project to company shareholder in a manner so they can understand.

Check out the jupyter notebook
Check out medium blogpost to understand the project in detail.

10. Personalized Real Estate Agent

Envision yourself as a skilled developer at "Future Homes Realty," an innovative real estate firm. In an industry where personalisation is crucial for consumer satisfaction, your company aims to transform client interactions with real estate listings. To create a novel application called HomeMatch that utilises large language models (LLMs) and vector databases to convert conventional real estate listings into customised narratives that align with the distinct preferences and requirements of prospective purchasers.

Check out the jupyter notebook

9. ChatBOT using Retrieval Augmented Generation (RAG)

Custom Chatbot project so that our fashion-focused chat interface can work with it. The information in this dataset carefully shows the complex changes that happen in modern fashion. It includes famous colour schemes, fabric choices, and other important fashion insights seen in 2023. This dataset fits perfectly intending to make it easier to create a complex chatbot that can meet the specific needs of fashion fans and people who work in the industry.

Check out the jupyter notebook

8. Landmark Classification (using CNN) & Tagging for Social Media

Check out the app
Check out the jupyter notebook

7. Classification of Handwritten Digit using MNIST Data

The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.

The MNIST is a standard dataset used in computer vision and deep learning. The MNIST acronym stands for the Modified National Institute of Standards and Technology dataset. It has 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

Check out the jupyter notebook

6. Disaster Response System

I analyzed disaster data to build a model for an API that classifies disaster messages by applying my data engineering skills. I have created a ML pipeline to categorize real messages that were sent during disaster events so that the messages could be sent to an appropriate disaster relief agency. The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.

5. Real World Object Detection using COCO dataset

Detecting objects with 65% confidence using pre-trained MobileNet-SSD v3 model and 183 labels or classes from COCO 2017 dataset. User can also use webcam to detect objects around their surroundings by running objectDetectionWebCam.py

4. Sip & Script

I ran an exploratory data analysis utilizing a Wine Reviews Dataset from Kaggle, which contained roughly 130k Wine Enthusiast reviews. I took this project as an opportunity to analyse the data and explain my results through a medium blog post that provides insight into the questions posed.

Check out the jupyter notebook

3. Multi-Class Dog Breed Classification

The aim is to create a classifier capable of predicting a dog's breed from a photo. In a real world scenario when someone takes a photo of a dog and wants to know what breed of dog it is using this model. The dataset contains 20000+ images of dogs of 120 breeds (12- classes)

Check out the jupyter notebook

2. Multi Class Image Classifier

Data - CIFAR10

1. Heart Disease Prediction using Regression

Question. Can the presence of heart disease in the patient be predicted based on their clinical parameters?

The dataset contains 76 attributes, but all published experiments refer to using a subset of 14 of them. It is part of Cleveland database that has been used by ML researchers to this date and orginated from UCI Machine Learning Repository.

Environment Setup

Open your favourite terminal or cmd to download the dependencies listed in envALL.yml

conda env create -f envALL.yml

Pytorch Installation (with GPU)

0. Guide to local setup using GPU: https://pytorch.org/get-started/locally/

1. Nvidia CUDA Toolkit Setup

1.1 Open `cmd` to check if the machine has a GPU-CUDA access

C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

1.2. Else follow the following steps

Go to Cuda Toolkit and download the toolkit version (11.7) supported by Pytorch.
Install the Cuda Toolkit and after installation run the following nvcc --version in CMD to check CUDA version.

2. Installing PyTorch

2.1. Create a virtul environment in local drive

(base) D:\workspace_Data_Science>conda create -n env_torch

(base) D:\workspace_Data_Science>conda activate env_torch

2.2 Install PyTorch according to Guide

(env_torch) D:\workspace_Data_Science>conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

## Package Plan ##

  environment location: C:\Anaconda_2021\envs\env_torch

  added / updated specs:
    - pytorch
    - pytorch-cuda=11.7
    - torchaudio
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    brotlipy-0.7.0             |py310h2bbff1b_1002         335 KB
    ca-certificates-2023.01.10 |       haa95532_0         121 KB
    certifi-2022.12.7          |  py310haa95532_0         149 KB
    cffi-1.15.1                |  py310h2bbff1b_3         239 KB
    cryptography-39.0.1        |  py310h21b164f_0         1.0 MB
    cuda-demo-suite-12.1.55    |                0         4.7 MB  nvidia
    cuda-documentation-12.1.55 |                0          89 KB  nvidia
    cuda-nsight-compute-12.1.0 |                0           1 KB  nvidia
    cuda-nvdisasm-12.1.55      |                0        48.0 MB  nvidia
    cuda-nvprof-12.1.55        |                0         1.6 MB  nvidia
    cuda-nvvp-12.1.55          |                0       113.6 MB  nvidia
    cuda-sanitizer-api-12.1.55 |                0        12.9 MB  nvidia
    giflib-5.2.1               |       h8cc25b3_3          88 KB
    idna-3.4                   |  py310haa95532_0          97 KB
    jpeg-9e                    |       h2bbff1b_1         320 KB
    libcurand-10.3.2.56        |                0           3 KB  nvidia
    libcurand-dev-10.3.2.56    |                0        50.0 MB  nvidia
    .
    .
    . and many more

2.3 Check if pytorch installed successfully ???

(env_torch) D:\workspace_Data_Science>python
Python 3.10.9 | packaged by Anaconda, Inc. | (main, Mar  8 2023, 10:42:25) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>
>>>
>>> print(torch.rand(2,4))
tensor([[0.1220, 0.2692, 0.8196, 0.2800],
        [0.3619, 0.8364, 0.9870, 0.7860]])
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'NVIDIA GeForce RTX 2060'

2.5 Install required pkgs

(env_torch) D:\workspace_Data_Science>conda install pandas matplotlib seaborn scikit-learn

Additional Data

Arthropod Taxonomy Orders Object Detection Dataset: Invertebrate animal (arthropod) images annotated for object detection
Create Object Detection Video: In this notebook we will run object detection on a video, frame by frame, using a model trained in another notebook. Then we will render a new video with the detections.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Artificial Intelligence (AI)		Artificial Intelligence (AI)
C++		C++
Data-Scientist-Nanodegree		Data-Scientist-Nanodegree
Data-Structures-Algorithms-Nanodegree		Data-Structures-Algorithms-Nanodegree
Deep-Learning-NanoDegree		Deep-Learning-NanoDegree
FullStackDev Nanodegree		FullStackDev Nanodegree
GenAI		GenAI
Machine Learning		Machine Learning
Natural Language Processing (NLP)		Natural Language Processing (NLP)
Object Detection CV		Object Detection CV
README.md		README.md
envALL.yml		envALL.yml
envNLP.yml		envNLP.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Projects

11. Sparkify - Detection of User Churn using PySpark

10. Personalized Real Estate Agent

9. ChatBOT using Retrieval Augmented Generation (RAG)

8. Landmark Classification (using CNN) & Tagging for Social Media

7. Classification of Handwritten Digit using MNIST Data

6. Disaster Response System

5. Real World Object Detection using COCO dataset

4. Sip & Script

3. Multi-Class Dog Breed Classification

2. Multi Class Image Classifier

Data - CIFAR10

1. Heart Disease Prediction using Regression

Environment Setup

Pytorch Installation (with GPU)

0. Guide to local setup using GPU: https://pytorch.org/get-started/locally/

1. Nvidia CUDA Toolkit Setup

1.1 Open `cmd` to check if the machine has a GPU-CUDA access

1.2. Else follow the following steps

2. Installing PyTorch

2.1. Create a virtul environment in local drive

2.2 Install PyTorch according to Guide

2.3 Check if pytorch installed successfully ???

2.5 Install required pkgs

Additional Data

About

Releases

Packages

Languages

aghoshpro/myProjects

Folders and files

Latest commit

History

Repository files navigation

Data Science Projects

11. Sparkify - Detection of User Churn using PySpark

10. Personalized Real Estate Agent

9. ChatBOT using Retrieval Augmented Generation (RAG)

8. Landmark Classification (using CNN) & Tagging for Social Media

7. Classification of Handwritten Digit using MNIST Data

6. Disaster Response System

5. Real World Object Detection using COCO dataset

4. Sip & Script

3. Multi-Class Dog Breed Classification

2. Multi Class Image Classifier

Data - CIFAR10

1. Heart Disease Prediction using Regression

Environment Setup

Pytorch Installation (with GPU)

0. Guide to local setup using GPU: https://pytorch.org/get-started/locally/

1. Nvidia CUDA Toolkit Setup

1.1 Open cmd to check if the machine has a GPU-CUDA access

1.2. Else follow the following steps

2. Installing PyTorch

2.1. Create a virtul environment in local drive

2.2 Install PyTorch according to Guide

2.3 Check if pytorch installed successfully ???

2.5 Install required pkgs

Additional Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1.1 Open `cmd` to check if the machine has a GPU-CUDA access

Packages