Python will return:
StructType(List(StructField(DEST_COUNTRY_NAME,StringType,true),
StructField(ORIGIN_COUNTRY_NAME,StringType,true),
StructField(count,LongType,true)))
A schema is a StructType made up of a number of fields, StructFields , that have a name, type, and a boolean
flag which specifies whether or not that column can contain missing or null values. Schemas can also contain other
StructType (Spark’s complex types). We will see this in the next chapter when we discuss working with complex
types.
Here’s how to create, and enforce a specific schema on a DataFrame. If the types in the data (at runtime), do not
match the schema. Spark will throw an error.
%scala
import org.apache.spark.sql.types .{StructField, StructType, StringType, LongType}
val myManualSchema = new StructType(Array(
new StructField (“DEST_COUNTRY_NAME” , StringType, true),
new StructField (“ORIGIN_COUNTRY_NAME” , StringType, true),
new StructField (“count”, LongType, false) // just to illustrate flipping
this flag
))
val df = spark. read.format(“json”)
.schema(myManualSchema)
.load(“/mnt/defg/flight-data/json/2015-summary.json” )
Here’s how to do the same in Python.
%python
from pyspark.sql.types import StructField, StructType, StringType, LongType
56