A command-line tool for launching Apache Spark clusters.
-
Updated
Jul 5, 2024 - Python
A command-line tool for launching Apache Spark clusters.
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc
This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.
Implementations of Markov Clustrer Algorithm (MCL) and Regularized Markov Cluster Algorithm (R-MCL) in Apache Spark
Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.
Apache Spark cluster lab.
This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segregation of zoning types and promote inclusivity.
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
data enginerring project - visualize visa numbers by country, time issued from japan
Add a description, image, and links to the apache-spark-cluster topic page so that developers can more easily learn about it.
To associate your repository with the apache-spark-cluster topic, visit your repo's landing page and select "manage topics."