Compatibility
Sail supports Arrow data types that is a superset of data types available in Spark SQL. For more background information, you can refer to the Data Types guide for the Spark DataFrame API.
The following table shows the SQL type syntax along with the corresponding Spark data types and Arrow data types. Many data types have aliases not supported in JVM Spark. These are extensions in Sail.
Many Arrow data types do not have a corresponding SQL type syntax, but they are still supported in Sail. You can work with these types in Python UDFs or data sources.
| SQL Type Syntax | Spark Data Type | Arrow Data Type |
|---|---|---|
NULLVOID | NullType | Null |
BOOLEANBOOL | BooleanType | Boolean |
BYTETINYINTINT8 | ByteType | Int8 |
SHORTSMALLINTINT16 | ShortType | Int16 |
INTEGERINTINT32 | IntegerType | Int32 |
LONGBIGINTINT64 | LongType | Int64 |
UNSIGNED BYTEUNSIGNED TINYINTUINT8 | - | UInt8 |
UNSIGNED SHORTUNSIGNED SMALLINTUINT16 | - | UInt16 |
UNSIGNED INTEGERUNSIGNED INTUINT32 | - | UInt32 |
UNSIGNED LONGUNSIGNED BIGINTUINT64 | - | UInt64 |
| - | - | Float16 |
FLOATREALFLOAT32 | FloatType | Float32 |
DOUBLEFLOAT64 | DoubleType | Float64 |
DATEDATE32 | DateType | Date32 |
DATE64 | - | Date64 |
| - | - | Time32(Second) Time32(Millisecond) Time64(Microsecond) Time64(Nanosecond) |
TIMESTAMP[(p)] | TimestampType TimestampNTZType | Timestamp(_, _) |
TIMESTAMP_LTZ[(p)]TIMESTAMP[(p)] WITH [LOCAL ]TIME ZONE | TimestampType | Timestamp(_, TimeZone(_)) |
TIMESTAMP_NTZ[(p)]TIMESTAMP[(p)] WITHOUT TIME ZONE | TimestampNTZType | Timestamp(_, NoTimeZone) |
STRING | StringType | Utf8 LargeUtf8 |
TEXT | - | LargeUtf8 |
CHAR(n)CHARACTER(n) | CharType(n) | Utf8 LargeUtf8 |
VARCHAR(n) | VarcharType(n) | Utf8 LargeUtf8 |
| - | - | Utf8View |
BINARYBYTEA | BinaryType | Binary LargeBinary |
| - | - | FixedSizeBinary BinaryView |
| - | - | Decimal32 Decimal64 |
DECIMAL[(p[, s])]DEC[(p[, s])]NUMERIC[(p[, s])] | DecimalType | Decimal128 Decimal256 |
| - | - | Duration(Second) Duration(Millisecond) Duration(Nanosecond) |
INTERVAL YEARINTERVAL YEAR TO MONTHINTERVAL MONTH | YearMonthIntervalType | Interval(YearMonth) |
| - | - | Interval(DayTime) |
INTERVAL DAYINTERVAL DAY TO HOURINTERVAL DAY TO MINUTEINTERVAL DAY TO SECONDINTERVAL HOURINTERVAL HOUR TO MINUTEINTERVAL HOUR TO SECONDINTERVAL MINUTEINTERVAL MINUTE TO SECONDINTERVAL SECOND | DayTimeIntervalType | Duration(Microsecond) |
INTERVAL | CalendarIntervalType | Interval(MonthDayNano) |
ARRAY<type> | ArrayType | List |
| - | - | LargeList FixedSizeList ListView LargeListView |
MAP<key-type, value-type> | MapType | Map |
STRUCT<name[:] type(, name[:] type)*> | StructType | Struct |
| - | - | Union |
| - | - | Dictionary |
| - | - | RunEndEncoded |
Notes
- The SQL string types (except
TEXT) are mapped to either the Utf8 or LargeUtf8 type in Arrow, depending on thespark.sql.execution.arrow.useLargeVarTypesconfiguration option. - The SQL binary types are mapped to either the Binary or LargeBinary type in Arrow, depending on the
spark.sql.execution.arrow.useLargeVarTypesconfiguration option. - The SQL
TIMESTAMPtype can either represent timestamps with local time zone (TIMESTAMP_LTZ, the default) or timestamps without time zone (TIMESTAMP_NTZ), depending on thespark.sql.timestampTypeconfiguration option. - For the SQL timestamp types, the optional
pparameter specifies the precision of the timestamp. A number of0,3,6, or9represents second, millisecond, microsecond, or nanosecond precision respectively. The default value is6(microsecond precision). Note that only the microsecond precision timestamp is compatible with Spark. - For the SQL decimal types, the optional
pandsparameters specify the precision and scale of the decimal number respectively. The default precision is10and the default scale is0. The decimal type maps to either Decimal128 or Decimal256 type in Arrow depending on the specified precision. - The SQL
INTERVALtype is mapped to the Interval(MonthDayNano) Arrow type which has nanosecond precision. CalendarIntervalType in Spark has microsecond precision so the supported data range is different.
