Changelog
0.2.1
January 15, 2025
- Supported SQL table functions and lateral views (#326 and #327).
- Supported PySpark UDTFs (#329).
- Improved literal and data type support (#317, #328, #330, and #339).
- Supported
ANTI JOIN
andSEMI JOIN
(#337). - Fixed a few PySpark UDF issues (#343).
- Supported nested fields in SQL (#340).
- Supported more queries in the derived TPC-DS benchmark (#346).
- Supported more datetime functions (#349).
0.2.0
December 3, 2024
We are excited to announce the first Sail release with the distributed processing capability. Spark SQL and DataFrame queries can now run on Kubernetes, powered by the Sail distributed compute engine. We also introduced a new Sail CLI and a configuration mechanism that will serve as the entrypoint for all Sail features moving forward.
We continued extending coverage for Spark SQL functions and the Spark DataFrame API. The changes are listed below.
- Supported the following DataFrame and SQL functions (#278 and #305).
DataFrame.crosstab
DataFrame.replace
DataFrame.to
reverse
aes_decrypt
aes_encrypt
try_aes_decrypt
base64
unbase64
weekofyear
- Supported
mapInPandas()
andmapInArrow()
for Spark DataFrame (#310). - Supported
applyInPandas()
for grouped and co-grouped Spark DataFrame (#313).
Breaking Changes
This release comes with the new Sail CLI, and the way to launch the Spark Connect server and PySpark shell is different from the 0.1.x versions. Please refer to the Getting Started page for the updated instructions.
0.1.7
November 1, 2024
- Expanded support for Spark DataFrame functions (#268 and #261). Added full parity and coverage for the following DataFrame and SQL functions.
DataFrame.summary
DataFrame.describe
DataFrame.corr
DataFrame.cov
DataFrame.stat
DataFrame.drop
corr
regr_avgx
- Fixed most issues with
ORDER BY
in the derived TPC-DS benchmark, bringing total coverage to 74 out of the 99 queries (#261).
We also made significant changes to the Sail internals to support distributed processing. We are targeting the 0.2.0 release in the next few weeks for an MVP (minimum viable product) of this exciting feature. Please stay tuned! If you are interested in the ongoing work, you can follow #246 in our GitHub repository to get the latest updates!
0.1.6
October 23, 2024
0.1.5
October 17, 2024
- Expanded support for Spark SQL syntax and functions (#239 and #247). Added full parity and coverage for the following SQL functions.
current_catalog
current_database
current_schema
hash
hex
unhex
xxhash64
unix_timestamp
- Fixed a few issues with
JOIN
(#250).
0.1.4
October 03, 2024
- Enabled Avro in DataFusion (#234).
- Expanded support for Spark SQL syntax and functions (#213 and #207). Added full parity and coverage for the following SQL functions.
array
date_format
get_json_object
json_array_length
overlay
replace
split_part
to_date
any_value
approx_count_distinct
current_timezone
first_value
greatest
last
last_value
least
map_contains_key
map_keys
map_values
min_by
substr
sum_distinct
- Supported HDFS (#196).
- Supported parsing value prefixes followed by whitespace (#218 and lakehq/sqlparser-rs#6).
- Added basic support for Python UDAF (#214).
Contributors
Huge thanks to our first community contributor, @skewballfox for adding support for HDFS!!
0.1.3
September 18, 2024
- Supported column positions in
GROUP BY
andORDER BY
(#205). - Expanded support for
INSERT
statements (#195). - Fixed issues with Spark configuration (#192).
- Expanded support for
CREATE
andREPLACE
statements (#183). - Supported
GROUPING SETS
aggregation (#184). - Integrated fastrace for more performant logging and tracing (#166).
- Enabled gzip and zstd compression in Tonic (#166).
0.1.2
September 10, 2024
- Fixed issues with aggregation queries.
- Extended support for SQL functions.
- Added support for temporary views and global temporary views.
0.1.1
September 03, 2024
- Extended support for SQL statements and SQL functions.
- Fixed a performance issue for the PySpark DataFrame
show()
method.
0.1.0
August 29, 2024
This is the first Sail release.