Delta Lake
You can use the delta format in Sail to work with Delta Lake. You can use the Spark DataFrame API or Spark SQL to read and write Delta tables.
Examples
INFO
In the code below, spark refers to a Spark client session connected to the Sail server. You can refer to the Getting Started guide for how it works.
Basic Usage
path = "file:///tmp/sail/users"
df = spark.createDataFrame(
[(1, "Alice"), (2, "Bob")],
schema="id INT, name STRING",
)
# This creates a new table or overwrites an existing one.
df.write.format("delta").mode("overwrite").save(path)
# This appends data to an existing table.
df.write.format("delta").mode("append").save(path)
df = spark.read.format("delta").load(path)
df.show()CREATE TABLE users (id INT, name STRING)
USING delta
LOCATION 'file:///tmp/sail/users';
INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob');
SELECT * FROM users;Data Partitioning
You can work with partitioned Delta tables using the Spark DataFrame API. Partitioned Delta tables organize data into directories based on the values of one or more columns. This improves query performance by skipping data files that do not match the filter conditions.
path = "file:///tmp/sail/metrics"
df = spark.createDataFrame(
[(2024, 1.0), (2025, 2.0)],
schema="year INT, value FLOAT",
)
df.write.format("delta").mode("overwrite").partitionBy("year").save(path)
df = spark.read.format("delta").load(path).filter("year > 2024")
df.show()CREATE TABLE metrics (year INT, value FLOAT)
USING delta
LOCATION 'file:///tmp/sail/metrics'
PARTITIONED BY (year);
INSERT INTO metrics VALUES (2024, 1.0), (2025, 2.0);
SELECT * FROM metrics WHERE year > 2024;Schema Evolution
Delta Lake handles schema evolution gracefully. By default, if you try to write data with a different schema than the one of the existing Delta table, an error will occur. You can enable schema evolution by setting the mergeSchema option to true when writing data. In this case, if you change the data type of an existing column to a compatible type, or add a new column, Delta Lake will automatically update the schema of the table.
df.write.format("delta").mode("append").option("mergeSchema", "true").save(path)You can also use the overwriteSchema option to overwrite the schema of an existing Delta table. But this works only if you set the write mode to overwrite.
df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(path)Time Travel
You can use the time travel feature to query historical versions of a Delta table.
df = spark.read.format("delta").option("versionAsOf", "0").load(path)
df = spark.read.format("delta").option("timestampAsOf", "2025-01-02T03:04:05.678").load(path)Time travel is not available for Spark SQL in Sail yet, but we plan to support it soon.
Column Mapping
You can write Delta tables with column mapping enabled. The supported column mapping modes are name and id. You must write to a new Delta table to enable column mapping.
df.write.format("delta").option("columnMappingMode", "name").save(path)
df.write.format("delta").option("columnMappingMode", "id").save(path)Existing Delta tables with column mapping can be read as usual.
More Features
We will continue adding more examples for advanced Delta Lake features as they become available in Sail. In the meantime, feel free to reach out to us on Slack or GitHub Discussions if you have questions!
