Skip to content
View Shuo-Wang-UCBerkeley's full-sized avatar

Block or report Shuo-Wang-UCBerkeley

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

About

Hi, I'm Shuo Wang — 2 years of experience in AI/ML including large language models (LLM), natural language processing (NLP), deep learning, and model development and deployment. I also have 8 years leading teams and managing Communicating and projects. Currently seeking to transition into a Machine Learning Engineer or Data Science role.

Featured Projects

Very Intelligent Portfolio (Capstone Project)

  • Description: Develope an open-source Transformer-based tool to optimize portfolio weights and recommend hedging strategies.
  • Technologies Used: Python (sklearn, pandas, numpy, seaborn), Amazon SageMaker, S3, Kubernetes, PyTorch, Azure, JupyterLab
  • Link: https://very-intelligent-portfolio.github.io

NLP Model for Toxic Comment Detection

  • Description: Utilize advanced NLP techniques to build transformer-based models (including BART, BERT, DistilBERT, ALBERT, and RoBERTa) for toxic comment detection.
  • Technologies Used: Google Colab, TensorFlow, Neural Networks, Keras, Scikit-Learn
  • Link: https://github.com/Shuo-Wang-UCBerkeley/Text-Classification

DistilBERT API Deployment

  • Description: Design and implement a FastAPI application to serve prediction of an NLP model using DistilBERT from HuggingFace for sentiment analysis. Orchestrate the deployment of the application in Azure Kubernetes Service (AKS).
  • Technologies Used: FastAPI, Docker, Azure, Kubernetes, Redis, Poetry, Istio, Kustomize, Grafana, CI/CD
  • Link: https://github.com/Shuo-Wang-UCBerkeley/DistilBERT-API-Deployment

Predicting Flight Delays

  • Description: Predict departure delays greater than 15 minutes, 2 hours before takeoff
  • Technologies Used: PySpark (ml, sql), Python (matplotlib, pandas, numpy, seaborn, datetime), DataBricks, MapReduce
  • Link: https://github.com/Shuo-Wang-UCBerkeley/FlightDelay

Spotify Song Genre Prediction

Fashion MNIST Multi-Class Classification

  • Description: Develop Nerual Network model for Fashion MNIST images multi-class classification.

    • Processed 60K data points from Fashion MNIST dataset that includes 784 features (a 28*28 greyscale image) and a label from 10 classes. Visualized summary statistics of the features using Python Seaborn.
    • Developed a Neural Network model with three hidden layers and tuned the model with various activation functions (e.g., Tanh, ReLu) and optimizers (e.g., Adam, SGD) using Python TensorFlow, achieving an impressive 99% testing accuracy.
  • Technologies Used: Python, TensorFlow, Neural Networks

Data Analysis and Engineering for a Hypothetical Restaurant

  • Description: Increase Acme Gourmet Meals brand awareness beyond the local Berkeley neighborhood by identifying top customers, calculating nearest BART station, and deploying pickup/delivery service.
  • Technologies Used: SQL, Neo4j, Relational Databases
  • Link: https://github.com/Shuo-Wang-UCBerkeley/AGM-Brand

CO2 Emissions Analysis

  • Description: Car Transmission vs CO2 Emissions

    • Conducted an extensive analysis of car transmission impacts on CO2 emissions using Agency (VCA) data (2000-2013).
    • Analyzed over 7,000 entries after data cleaning to establish correlations between transmission types, fuel types, and engine capacities with CO2 emissions. Consolidated data by car models to remove bias due to repetition.
    • Employed linear regression to demonstrate that diesel-fueled and manual transmission vehicles emit significantly less CO2.
    • Identified potential model limitations including collinearity and omitted variables such as the drag coefficient and improvements in transmission technology.
    • Explored the impact of regulatory and technological changes over time, highlighting potential shifts in the relationship between transmission type and emissions in recent years.
  • Technologies Used: R (tidyverse, tsibble, forecast, ggplot), Python

Voting Difficulty Analysis Project

  • Description: Evaluate which Party voters experience more difficulty voting in the 2020 Election.

    • Led a statistical investigation into voter difficulties between Democratic and Republican voters during the 2020 U.S. election, utilizing the American National Election Studies data consisting of over 8,000 survey responses.
    • Applied a non-parametric Wilcoxon rank-sum test to rigorously compare voting difficulties.
    • Defined a novel metric for voter difficulty that included both voters and those who intended but failed to vote, enhancing the breadth of the analysis.
    • Applied a non-parametric Wilcoxon rank-sum test to rigorously compare voting difficulties.
    • Identified a meaningful difference in difficulty levels that may influence election outcomes, advocating for targeted political strategies to alleviate voting barriers.
  • Technologies Used: R (tidyverse, tsibble, forecast, ggplot), Python, Statistical Analysis, Wilcoxon Test

Contact

Feel free to reach out to me via email at wshuo87@gmail.com or connect with me on LinkedIn (https://www.linkedin.com/in/Shuo-Wang-PE).

Resume

Check out my resume for an overview of my skills, work experience, and education.

Popular repositories Loading

  1. DistilBERT-API-Deployment DistilBERT-API-Deployment Public

    Python 1

  2. Lab_1 Lab_1 Public

  3. lab_2 lab_2 Public

  4. Text-Classification Text-Classification Public

    Jupyter Notebook

  5. FlightDelay FlightDelay Public

    HTML

  6. Song-Genre Song-Genre Public

    Jupyter Notebook