Skip to content

mohan1998g/PySpark-Core-SQL-RDD-Dataframes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark

All I know about PySpark

For all the code executions above, Please put the below code at the begining import sys from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession import os from pyspark.sql.functions import *

python_path = sys.executable os.environ['PYSPARK_PYTHON'] = python_path os.environ['JAVA_HOME'] = r'Java path' # Please write the java path here as per your laptop conf = SparkConf().setAppName("pyspark").setMaster("local[*]").set("spark.driver.host", "localhost").set("spark.default.parallelism", "1") sc = SparkContext(conf=conf) spark = SparkSession.builder.getOrCreate()

If you truly want to learn Big data tools like Hadoop, Sqoop, Hive, Spark and Cloud counterparts of them. Please ping me on whatsapp 9490716829.

About

All I know about PySpark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published