Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
-
Updated
Aug 26, 2020 - Jupyter Notebook
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
A big data project to develop a real-time data pipeline for analyzing the popularity and sentiments of trending topics on Twitter.
Efficiently tackle large datasets and perform big data analysis with Spark and Python
MapReduce Job Development, RDDs Programming, Medical Data Management, Sales Analysis, And Efficient Data Integration For Big Data Analysis. Spark: Big Data Processing, SQOOP Integration, And Spark Structured Streaming For Real-Time Data.
Pyspark studies.
Spark, RDDs and Map Reduce applications related to the BigData @polito course (2019-2020). A set of personal notes are already provided.
Analysis of Clinical Trial Dataset using PySpark RDD implementation.
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
This assignment was part of an IoT motion sensor App running on a watch, predicting actions of the individual wearing the watch based on his arm movements; this IoT Analytics assignments is one of a series of data pipeline coding challenges in the IBM course Scalable Data Science.
Here I play with the services offered by Apache Spark and try to learn them in more depth.
📈📊 Big Data Notebooks . ▫️ Análisis masivos de datos con pyspark ▫️ Ingesta de datos. ▫️ Algoritmos de machine learning con datos masivos. ▫️ Procesamiento de mensajes en tiempo real con Kafka.
Add a description, image, and links to the rdds topic page so that developers can more easily learn about it.
To associate your repository with the rdds topic, visit your repo's landing page and select "manage topics."