Data Types

Sail supports Arrow data types that is a superset of data types available in Spark SQL. For more background information, you can refer to the Data Types guide for the Spark DataFrame API.

The following table shows the SQL type syntax along with the corresponding Spark data types and Arrow data types. Many data types have aliases not supported in JVM Spark. These are extensions in Sail.

Many Arrow data types do not have a corresponding SQL type syntax, but they are still supported in Sail. You can work with these types in Python UDFs or data sources.

SQL Type Syntax	Spark Data Type	Arrow Data Type
`NULL` `VOID`	NullType	Null
`BOOLEAN` `BOOL`	BooleanType	Boolean
`BYTE` `TINYINT` `INT8`	ByteType	Int8
`SHORT` `SMALLINT` `INT16`	ShortType	Int16
`INTEGER` `INT` `INT32`	IntegerType	Int32
`LONG` `BIGINT` `INT64`	LongType	Int64
`UNSIGNED BYTE` `UNSIGNED TINYINT` `UINT8`	-	UInt8
`UNSIGNED SHORT` `UNSIGNED SMALLINT` `UINT16`	-	UInt16
`UNSIGNED INTEGER` `UNSIGNED INT` `UINT32`	-	UInt32
`UNSIGNED LONG` `UNSIGNED BIGINT` `UINT64`	-	UInt64
-	-	Float16
`FLOAT` `REAL` `FLOAT32`	FloatType	Float32
`DOUBLE` `FLOAT64`	DoubleType	Float64
`DATE` `DATE32`	DateType	Date32
`DATE64`	-	Date64
-	-	Time32(Second) Time32(Millisecond) Time64(Microsecond) Time64(Nanosecond)
`TIMESTAMP[(p)]`	TimestampType TimestampNTZType	Timestamp(_, _)
`TIMESTAMP_LTZ[(p)]` `TIMESTAMP[(p)] WITH [LOCAL ]TIME ZONE`	TimestampType	Timestamp(_, TimeZone(_))
`TIMESTAMP_NTZ[(p)]` `TIMESTAMP[(p)] WITHOUT TIME ZONE`	TimestampNTZType	Timestamp(_, NoTimeZone)
`STRING`	StringType	Utf8 LargeUtf8
`TEXT`	-	LargeUtf8
`CHAR(n)` `CHARACTER(n)`	CharType(n)	Utf8 LargeUtf8
`VARCHAR(n)`	VarcharType(n)	Utf8 LargeUtf8
-	-	Utf8View
`BINARY` `BYTEA`	BinaryType	Binary LargeBinary
-	-	FixedSizeBinary BinaryView
-	-	Decimal32 Decimal64
`DECIMAL[(p[, s])]` `DEC[(p[, s])]` `NUMERIC[(p[, s])]`	DecimalType	Decimal128 Decimal256
-	-	Duration(Second) Duration(Millisecond) Duration(Nanosecond)
`INTERVAL YEAR` `INTERVAL YEAR TO MONTH` `INTERVAL MONTH`	YearMonthIntervalType	Interval(YearMonth)
-	-	Interval(DayTime)
`INTERVAL DAY` `INTERVAL DAY TO HOUR` `INTERVAL DAY TO MINUTE` `INTERVAL DAY TO SECOND` `INTERVAL HOUR` `INTERVAL HOUR TO MINUTE` `INTERVAL HOUR TO SECOND` `INTERVAL MINUTE` `INTERVAL MINUTE TO SECOND` `INTERVAL SECOND`	DayTimeIntervalType	Duration(Microsecond)
`INTERVAL`	CalendarIntervalType	Interval(MonthDayNano)
`ARRAY<type>`	ArrayType	List
-	-	LargeList FixedSizeList ListView LargeListView
`MAP<key-type, value-type>`	MapType	Map
`STRUCT<name[:] type(, name[:] type)*>`	StructType	Struct
-	-	Union
-	-	Dictionary
-	-	RunEndEncoded

Notes

The SQL string types (except TEXT) are mapped to either the Utf8 or LargeUtf8 type in Arrow, depending on the spark.sql.execution.arrow.useLargeVarTypes configuration option.
The SQL binary types are mapped to either the Binary or LargeBinary type in Arrow, depending on the spark.sql.execution.arrow.useLargeVarTypes configuration option.
The SQL TIMESTAMP type can either represent timestamps with local time zone (TIMESTAMP_LTZ, the default) or timestamps without time zone (TIMESTAMP_NTZ), depending on the spark.sql.timestampType configuration option.
For the SQL timestamp types, the optional p parameter specifies the precision of the timestamp. A number of 0, 3, 6, or 9 represents second, millisecond, microsecond, or nanosecond precision respectively. The default value is 6 (microsecond precision). Note that only the microsecond precision timestamp is compatible with Spark.
For the SQL decimal types, the optional p and s parameters specify the precision and scale of the decimal number respectively. The default precision is 10 and the default scale is 0. The decimal type maps to either Decimal128 or Decimal256 type in Arrow depending on the specified precision.
The SQL INTERVAL type is mapped to the Interval(MonthDayNano) Arrow type which has nanosecond precision. CalendarIntervalType in Spark has microsecond precision so the supported data range is different.

Spark DataFrame API

Data Types

Spark SQL

Literals

Functions and Operators

User-Defined Functions

Data Formats

Data Storage

Integrations

Deployment

Building Docker Images

Data Types

Notes

Data Types

Literals

Building Docker Images

Data Types ​

Notes ​

Data Types

Notes