Supported Features
Here is a high-level overview of the Spark DataFrame API features supported by Sail. The list covers the most common use cases of the DataFrame API and is not meant to be complete.
Feature | Supported |
---|---|
I/O - Reading (SparkSession.read ) | ✅ |
I/O - Writing (SparkSession.write ) | ✅ |
Structured Streaming (SparkSession.readStream ) | 🚧 |
Result collection (DataFrame.show , DataFrame.collect , and DataFrame.count ) | ✅ |
Schema display (DataFrame.printSchema ) | ✅ |
Query - Projection (DataFrame.select and DataFrame.selectExpr ) | ✅ |
Query - Column operations (e.g. DataFrame.withColumn , DataFrame.replace , and DataFrame.drop ) | ✅ |
Query - Filtering (DataFrame.filter ) | ✅ |
Query - Aggregation (DataFrame.agg and DataFrame.groupBy ) | ✅ |
Query - Join (DataFrame.join ) | ✅ |
Query - Set operations (e.g. DataFrame.union , DataFrame.intersect , and DataFrame.exceptAll ) | ✅ |
Query - Limit (DataFrame.offset and DataFrame.limit ) | ✅ |
Query - Sorting (DataFrame.sort and DataFrame.orderBy ) | ✅ |
NA functions (DataFrame.na ) | ✅ |
Statistics functions (DataFrame.stat ) | 🚧 |
View management (e.g. DataFrame.createOrReplaceTempView ) | ✅ |
RDD Access (DataFrame.rdd ) | ❌ |
PySpark UDFs | ✅ |
PySpark UDTFs | ✅ |