Built-in Spark Window Functions
INFO
Unless otherwise specified, the functions listed here are available in both PySpark (the pyspark.sql.functions module) and Spark SQL.
Window Functions
| Function | Supported | Note |
|---|---|---|
cume_dist | ✅ | |
dense_rank | ✅ | |
lag | ✅ | |
lead | ✅ | |
nth_value | ✅ | |
ntile | ✅ | |
percent_rank | ✅ | |
rank | ✅ | |
row_number | ✅ |
Aggregate Window Functions
| Function | Supported | Note |
|---|---|---|
any | ✅ | This is Spark SQL only (not available in PySpark). |
any_value | ✅ | |
approx_count_distinct | ✅ | |
approx_percentile | ✅ | |
array_agg | ✅ | |
avg | ✅ | |
bit_and | ✅ | |
bit_or | ✅ | |
bit_xor | ✅ | |
bool_and | ✅ | |
bool_or | ✅ | |
collect_list | ✅ | |
collect_set | ✅ (partial) | |
corr | ✅ | |
count | ✅ | |
count_distinct | ✅ | |
count_if | ✅ | |
covar_pop | ✅ | |
covar_samp | ✅ | |
every | ✅ | |
first | ✅ | |
first_value | ✅ | |
grouping | ✅ | |
kurtosis | ✅ | |
last | ✅ | |
last_value | ✅ | |
listagg | ✅ | |
listagg_distinct | ✅ | |
max | ✅ | |
max_by | ✅ | |
mean | ✅ | |
median | ✅ | |
min | ✅ | |
min_by | ✅ | |
mode | ✅ | |
percentile | ✅ | |
percentile_approx | ✅ | |
percentile_cont | ✅ | This is Spark SQL only (not available in PySpark). |
percentile_disc | ✅ | This is Spark SQL only (not available in PySpark). |
regr_avgx | ✅ | |
regr_avgy | ✅ | |
regr_count | ✅ | |
regr_intercept | ✅ | |
regr_r2 | ✅ | |
regr_slope | ✅ | |
regr_sxx | ✅ | |
regr_sxy | ✅ | |
regr_syy | ✅ | |
skewness | ✅ | |
some | ✅ | |
std | ✅ | |
stddev | ✅ | |
stddev_pop | ✅ | |
stddev_samp | ✅ | |
string_agg | ✅ | |
string_agg_distinct | ✅ | |
sum | ✅ | |
sum_distinct | ✅ | |
try_avg | ✅ | |
try_sum | ✅ | |
var_pop | ✅ | |
var_samp | ✅ | |
variance | ✅ | |
bitmap_construct_agg | 🚧 | |
bitmap_or_agg | 🚧 | |
count_min_sketch | 🚧 | |
grouping_id | 🚧 | |
histogram_numeric | 🚧 | |
hll_sketch_agg | 🚧 | |
hll_union_agg | 🚧 | |
product | 🚧 |
