Supported Features
Here is a high-level overview of the Spark DataFrame API features supported by Sail. The list covers the most common use cases of the DataFrame API and is not meant to be complete.
| Feature | Supported |
|---|---|
I/O - Reading (SparkSession.read) | ✅ |
I/O - Writing (DataFrame.write and DataFrame.writeTo()) | ✅ |
Structured Streaming (SparkSession.readStream) | 🚧 |
Result collection (DataFrame.show(), DataFrame.collect(), and DataFrame.count()) | ✅ |
Schema display (DataFrame.printSchema()) | ✅ |
Query - Projection (DataFrame.select() and DataFrame.selectExpr()) | ✅ |
Query - Column operations (e.g. DataFrame.withColumn(), DataFrame.replace(), and DataFrame.drop()) | ✅ |
Query - Filtering (DataFrame.filter()) | ✅ |
Query - Aggregation (DataFrame.agg() and DataFrame.groupBy()) | ✅ |
Query - Join (DataFrame.join()) | ✅ |
Query - Set operations (e.g. DataFrame.union(), DataFrame.intersect(), and DataFrame.exceptAll()) | ✅ |
Query - Limit (DataFrame.offset() and DataFrame.limit()) | ✅ |
Query - Sorting (DataFrame.sort() and DataFrame.orderBy()) | ✅ |
NA functions (DataFrame.na) | ✅ |
Statistics functions (DataFrame.stat) | 🚧 |
View management (e.g. DataFrame.createOrReplaceTempView()) | ✅ |
RDD Access (DataFrame.rdd) | ❌ |
| PySpark UDFs | ✅ |
| PySpark UDTFs | ✅ |
