User-Defined Functions
Sail provides performant support for PySpark user-defined functions (UDFs) and user-defined table functions (UDTFs). You can use UDFs and UDTFs in the PySpark DataFrame API. You can also register UDFs and UDTFs and then use them in Spark SQL queries.
INFO
Spark Java or Scala UDFs are not supported in Sail.
Supported APIs
Here is a list of supported (✅) and unsupported (🚧) PySpark APIs for UDFs and UDTFs.
INFO
The PySpark library uses different logic for input and output conversion, depending on whether Arrow optimization is enabled. Arrow optimization is controlled by the useArrow
argument of the udf()
and udtf()
wrappers, and the spark.sql.execution.pythonUDTF.arrow.enabled
configuration. Sail respects such configuration for input and output conversion. But note that Sail uses Arrow for query execution regardless of whether Arrow is enabled in PySpark.