Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
-
Updated
Jun 27, 2024 - Jupyter Notebook
Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
Useful helper functions for PySpark dataframe operations
Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.
Add a description, image, and links to the pyspark-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-dataframes topic, visit your repo's landing page and select "manage topics."