Supported Features

Here is a high-level overview of the Spark DataFrame API features supported by Sail. The list covers the most common use cases of the DataFrame API and is not meant to be complete.

Feature	Supported
I/O - Reading (`SparkSession.read`)	✅
I/O - Writing (`SparkSession.write`)	✅
Structured Streaming (`SparkSession.readStream`)	🚧
Result collection (`DataFrame.show`, `DataFrame.collect`, and `DataFrame.count`)	✅
Schema display (`DataFrame.printSchema`)	✅
Query - Projection (`DataFrame.select` and `DataFrame.selectExpr`)	✅
Query - Column operations (e.g. `DataFrame.withColumn`, `DataFrame.replace`, and `DataFrame.drop`)	✅
Query - Filtering (`DataFrame.filter`)	✅
Query - Aggregation (`DataFrame.agg` and `DataFrame.groupBy`)	✅
Query - Join (`DataFrame.join`)	✅
Query - Set operations (e.g. `DataFrame.union`, `DataFrame.intersect`, and `DataFrame.exceptAll`)	✅
Query - Limit (`DataFrame.offset` and `DataFrame.limit`)	✅
Query - Sorting (`DataFrame.sort` and `DataFrame.orderBy`)	✅
NA functions (`DataFrame.na`)	✅
Statistics functions (`DataFrame.stat`)	🚧
View management (e.g. `DataFrame.createOrReplaceTempView`)	✅
RDD Access (`DataFrame.rdd`)	❌
PySpark UDFs	✅
PySpark UDTFs	✅