#

apache-spark-cluster

Here are 10 public repositories matching this topic...

nchammas / flintrock

A command-line tool for launching Apache Spark clusters.

apache-spark ec2 orchestration apache-spark-cluster spark-ec2

Updated Jul 5, 2024
Python

PiercingDan / spark-Jupyter-AWS

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

aws spark apache-spark ec2 jupyter aws-s3 jupyter-notebook spark-clusters ebs-volumes aws-ec2 ec2-instance apache-spark-cluster

Updated Nov 3, 2017
Jupyter Notebook

aamargajbhiye / big-data-projects

This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc

apache-spark spark-java apache-ignite apache-spark-cluster igfs

Updated Oct 6, 2023
Java

josemarialuna / ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.

scala apache-spark clustering-evaluation spark-mllib apache-spark-cluster clustering-validation cvi spark-ml

Updated Mar 8, 2024
Scala

akaltsikis / Markov_Cluster_Algorithm

Implementations of Markov Clustrer Algorithm (MCL) and Regularized Markov Cluster Algorithm (R-MCL) in Apache Spark

big-data spark apache-spark distributed-computing clustering-algorithm sparse-matrices cluster-computing mcl apache-spark-cluster markov-cluster-algorithm

Updated Jul 18, 2017
Scala

SayamAlt / Bank-Customer-Churn-Prediction-using-PySpark

Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.

machine-learning apache-spark cross-validation data-visualization pyspark classification feature-engineering hyperparameter-tuning binary-classification feature-transformation apache-spark-cluster spark-ml azure-databricks data-processing-pipelines model-training-and-evaluation data-exploration-and-preprocessing

Updated Aug 4, 2024
Jupyter Notebook

savvydatainsights / spark

Apache Spark cluster lab.

ansible vagrant apache-spark apache-spark-cluster

Updated Apr 27, 2023
Java

ashsProjects / Distributed_Analytics_of_US_Residential_Zoning

This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segregation of zoning types and promote inclusivity.

distributed-systems machine-learning apache-spark distributed-computing hdfs spark-sql apache-spark-cluster

Updated May 4, 2024
Jupyter Notebook

arturobp3 / Steam_Analysis_For_Gamers

Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.

python steam apache-spark amazon-web-services data-processing apache-spark-cluster

Updated Dec 11, 2019
Python

erjan / data_engineering_japan_visas_pyspark

data enginerring project - visualize visa numbers by country, time issued from japan

project pyspark data-engineering aws-ec2 ec2-instance apache-spark-cluster

Updated Nov 22, 2023
HTML

Improve this page

Add a description, image, and links to the apache-spark-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-spark-cluster topic, visit your repo's landing page and select "manage topics."