Compatibility

All Spark data types are defined in the pyspark.sql.types package in PySpark.

The table below shows how Spark data types are mapped to Python types and Arrow data types.

Spark Data Type	PySpark API	Python Type	Arrow Data Type
NullType	`NullType()`	-	Null
BooleanType	`BooleanType()`	`bool`	Boolean
ByteType	`ByteType()`	`int`	Int8
ShortType	`ShortType()`	`int`	Int16
IntegerType	`IntegerType()`	`int`	Int32
LongType	`LongType()`	`int`	Int64
-	-	-	UInt8 UInt16 UInt32 UInt64
-	-	-	Float16
FloatType	`FloatType()`	`float`	Float32
DoubleType	`DoubleType()`	`float`	Float64
-	-	-	Decimal32 Decimal64
DecimalType	`DecimalType()`	`decimal.Decimal`	Decimal128 Decimal256
StringType	`StringType()`	`str`	Utf8 LargeUtf8
CharType(n)	`CharType(length: int)`	`str`	Utf8 LargeUtf8
VarcharType(n)	`VarcharType(length: int)`	`str`	Utf8 LargeUtf8
-	-	-	Utf8View
BinaryType	`BinaryType()`	`bytearray`	Binary LargeBinary
-	-	-	FixedSizeBinary BinaryView
TimestampType	`TimestampType()`	`datetime.datetime`	Timestamp(Microsecond, TimeZone(_))
TimestampNTZType	`TimestampNTZType()`	`datetime.datetime`	Timestamp(Microsecond, NoTimeZone)
-	-	-	Timestamp(Second, _) Timestamp(Millisecond, _) Timestamp(Nanosecond, _)
DateType	`DateType()`	`datetime.date`	Date32
-	-	-	Date64
-	-	-	Time32(Second) Time32(Millisecond) Time64(Microsecond) Time64(Nanosecond)
YearMonthIntervalType	`YearMonthIntervalType()`	-	Interval(YearMonth)
DayTimeIntervalType	`DayTimeIntervalType()`	`datetime.timedelta`	Duration(Microsecond)
CalendarIntervalType	`CalendarIntervalType()`	-	Interval(MonthDayNano)
-	-	-	Interval(DayTime)
-	-	-	Duration(Second) Duration(Millisecond) Duration(Nanosecond)
ArrayType	`ArrayType(elementType, containsNull: bool = True)`	`list` `tuple`	List
-	-	-	LargeList FixedSizeList ListView LargeListView
MapType	`MapType(keyType, valueType, valueContainsNull: bool = True)`	`dict`	Map
StructType	`StructType(fields)`	`list` `tuple`	Struct
-	-	-	Union
-	-	-	Dictionary
-	-	-	RunEndEncoded

Notes

DayTimeIntervalType in Spark has microsecond precision, and it is mapped to the Duration(Microsecond) Arrow type. It is not mapped to the Interval(DayTime) Arrow type which only has millisecond precision.
YearMonthIntervalType and CalendarIntervalType in Spark are not supported in Python, so calling the .collect() method will raise an error for a DataFrame that contains these types.
StringType, CharType(n), and VarcharType(n) in Spark are mapped to either the Utf8 or LargeUtf8 type in Arrow, depending on the spark.sql.execution.arrow.useLargeVarTypes configuration option.
BinaryType in Spark is mapped to either the Binary or LargeBinary type in Arrow, depending on the spark.sql.execution.arrow.useLargeVarTypes configuration option.
CalendarIntervalType in Spark has microsecond precision while the Interval(MonthDayNano) Arrow type has nanosecond precision. So the supported data range for calendar intervals is different between JVM Spark and Arrow.

Spark DataFrame API

Data Types

Spark SQL

Data Types

Literals

Functions and Operators

User-Defined Functions

Data Formats

Data Storage

Catalog

Integrations

Deployment

Building Docker Images

Compatibility

Notes

Data Types

Data Types

Literals

Building Docker Images

Compatibility ​

Notes ​

Compatibility

Notes