staticSchema error in chapter 3 in s #62

LeilaGhods · 2021-01-19T19:00:08Z

I get the following error in the running of the following code from Chapter 3 (Structured Streaming)

in Python

streamingDataFrame = spark.readStream
.schema(staticSchema)
.option("maxFilesPerTrigger", 1)
.format("csv")
.option("header", "true")
.load("/data/retail-data/by-day/*.csv")

NameError: name 'staticSchema' is not defined

NameError Traceback (most recent call last)
in
2 #How many files read together is identified by maxFilesPerTrigger
3 streamingDataFrame = spark.readStream
----> 4 .schema(staticSchema)
5 .option("maxFilesPerTrigger", 1)
6 .format("csv")\

NameError: name 'staticSchema' is not defined

Can anyone guide me about it? I am running the code on Databricks community cluster.

Thanks,

Silverlight42 · 2021-01-26T16:58:18Z

I think you are missing this:

staticDataFrame = spark.read.format("csv")\ 
.option("header", "true")\ 
.option("inferSchema", "true")\ 
.load("/data/retail-data/ by-day/*.csv") 

staticSchema = staticDataFrame.schema

For reference:

Chambers, Bill; Zaharia, Matei. Spark: The Definitive Guide: Big Data Processing Made Simple (p. 63). O'Reilly Media. Kindle Edition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staticSchema error in chapter 3 in s #62

staticSchema error in chapter 3 in s #62

LeilaGhods commented Jan 19, 2021

Silverlight42 commented Jan 26, 2021

staticSchema error in chapter 3 in s #62

staticSchema error in chapter 3 in s #62

Comments

LeilaGhods commented Jan 19, 2021

in Python

NameError: name 'staticSchema' is not defined

Silverlight42 commented Jan 26, 2021