Skip to content

I am upskilling myself by doing online course and respective capstone projects offered by Coursera, EdX, Udemy and Udacity,

Notifications You must be signed in to change notification settings

aghoshpro/myProjects

Repository files navigation

Data Science Projects

11. Sparkify - Detection of User Churn using PySpark

Forecasting churn or attrition rates presents a complex and prevalent challenge that data scientists and analysts frequently face in customer-oriented enterprises. The capacity to adeptly handle extensive datasets using Spark is among the most sought-after competencies in the data domain. Also to convey the findings of the project to company shareholder in a manner so they can understand.

10. Personalized Real Estate Agent

Envision yourself as a skilled developer at "Future Homes Realty," an innovative real estate firm. In an industry where personalisation is crucial for consumer satisfaction, your company aims to transform client interactions with real estate listings. To create a novel application called HomeMatch that utilises large language models (LLMs) and vector databases to convert conventional real estate listings into customised narratives that align with the distinct preferences and requirements of prospective purchasers.

9. ChatBOT using Retrieval Augmented Generation (RAG)

Custom Chatbot project so that our fashion-focused chat interface can work with it. The information in this dataset carefully shows the complex changes that happen in modern fashion. It includes famous colour schemes, fabric choices, and other important fashion insights seen in 2023. This dataset fits perfectly intending to make it easier to create a complex chatbot that can meet the specific needs of fashion fans and people who work in the industry.

8. Landmark Classification (using CNN) & Tagging for Social Media

7. Classification of Handwritten Digit using MNIST Data

The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.

The MNIST is a standard dataset used in computer vision and deep learning. The MNIST acronym stands for the Modified National Institute of Standards and Technology dataset. It has 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

6. Disaster Response System

I analyzed disaster data to build a model for an API that classifies disaster messages by applying my data engineering skills. I have created a ML pipeline to categorize real messages that were sent during disaster events so that the messages could be sent to an appropriate disaster relief agency. The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.

5. Real World Object Detection using COCO dataset

Detecting objects with 65% confidence using pre-trained MobileNet-SSD v3 model and 183 labels or classes from COCO 2017 dataset. User can also use webcam to detect objects around their surroundings by running objectDetectionWebCam.py

4. Sip & Script

I ran an exploratory data analysis utilizing a Wine Reviews Dataset from Kaggle, which contained roughly 130k Wine Enthusiast reviews. I took this project as an opportunity to analyse the data and explain my results through a medium blog post that provides insight into the questions posed.

3. Multi-Class Dog Breed Classification

The aim is to create a classifier capable of predicting a dog's breed from a photo. In a real world scenario when someone takes a photo of a dog and wants to know what breed of dog it is using this model. The dataset contains 20000+ images of dogs of 120 breeds (12- classes)

2. Multi Class Image Classifier

Data - CIFAR10

1. Heart Disease Prediction using Regression

Question. Can the presence of heart disease in the patient be predicted based on their clinical parameters?

The dataset contains 76 attributes, but all published experiments refer to using a subset of 14 of them. It is part of Cleveland database that has been used by ML researchers to this date and orginated from UCI Machine Learning Repository.




Environment Setup

Open your favourite terminal or cmd to download the dependencies listed in envALL.yml

conda env create -f envALL.yml

Pytorch Installation (with GPU)

0. Guide to local setup using GPU: https://pytorch.org/get-started/locally/

1. Nvidia CUDA Toolkit Setup

1.1 Open cmd to check if the machine has a GPU-CUDA access

C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

1.2. Else follow the following steps

  • Go to Cuda Toolkit and download the toolkit version (11.7) supported by Pytorch.

  • Install the Cuda Toolkit and after installation run the following nvcc --version in CMD to check CUDA version.

2. Installing PyTorch

2.1. Create a virtul environment in local drive

(base) D:\workspace_Data_Science>conda create -n env_torch
(base) D:\workspace_Data_Science>conda activate env_torch

2.2 Install PyTorch according to Guide

(env_torch) D:\workspace_Data_Science>conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
## Package Plan ##

  environment location: C:\Anaconda_2021\envs\env_torch

  added / updated specs:
    - pytorch
    - pytorch-cuda=11.7
    - torchaudio
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    brotlipy-0.7.0             |py310h2bbff1b_1002         335 KB
    ca-certificates-2023.01.10 |       haa95532_0         121 KB
    certifi-2022.12.7          |  py310haa95532_0         149 KB
    cffi-1.15.1                |  py310h2bbff1b_3         239 KB
    cryptography-39.0.1        |  py310h21b164f_0         1.0 MB
    cuda-demo-suite-12.1.55    |                0         4.7 MB  nvidia
    cuda-documentation-12.1.55 |                0          89 KB  nvidia
    cuda-nsight-compute-12.1.0 |                0           1 KB  nvidia
    cuda-nvdisasm-12.1.55      |                0        48.0 MB  nvidia
    cuda-nvprof-12.1.55        |                0         1.6 MB  nvidia
    cuda-nvvp-12.1.55          |                0       113.6 MB  nvidia
    cuda-sanitizer-api-12.1.55 |                0        12.9 MB  nvidia
    giflib-5.2.1               |       h8cc25b3_3          88 KB
    idna-3.4                   |  py310haa95532_0          97 KB
    jpeg-9e                    |       h2bbff1b_1         320 KB
    libcurand-10.3.2.56        |                0           3 KB  nvidia
    libcurand-dev-10.3.2.56    |                0        50.0 MB  nvidia
    .
    .
    . and many more

2.3 Check if pytorch installed successfully ???

(env_torch) D:\workspace_Data_Science>python
Python 3.10.9 | packaged by Anaconda, Inc. | (main, Mar  8 2023, 10:42:25) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>
>>>
>>> print(torch.rand(2,4))
tensor([[0.1220, 0.2692, 0.8196, 0.2800],
        [0.3619, 0.8364, 0.9870, 0.7860]])
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'NVIDIA GeForce RTX 2060'

2.5 Install required pkgs

(env_torch) D:\workspace_Data_Science>conda install pandas matplotlib seaborn scikit-learn

Additional Data

About

I am upskilling myself by doing online course and respective capstone projects offered by Coursera, EdX, Udemy and Udacity,

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published