Benchmark Results

We ran a derived TPC-H benchmark to compare the performance and resource efficiency of Sail and Apache Spark. The benchmark consists of 22 queries that cover a wide range of SQL operations, including filters, joins, aggregations, and subqueries.

Setup

Dataset Size: Scale factor 100 (100 GB raw data)
Dataset Format: Parquet
Host: AWS EC2 r8g.4xlarge (16 vCPU, 128 GB RAM)
Disk: Separate EBS volumes for data and Spark temporary files (4,000 IOPS, 1000 MB/s throughput)

Key Findings

Metric	Spark	Sail
Total Query Time	387.36 seconds	102.75 seconds
Query Speed-Up	0% (baseline)	43% - 727%
Peak Memory Usage	54 GB (constant)	22 GB (1 second)
Disk Write (Shuffle Spill)	> 110 GB	0 GB

From the results, we can see that Sail completes the workload nearly 4x faster than Spark, and Sail can run on 1/4 the instance size, leading to up to 94% cost reduction. Sail can handle larger datasets on the same hardware or achieve similar performance on smaller, cheaper infrastructure.

Detailed Results

Query Time

The following figure shows query time comparison between Sail and Spark for individual queries.

The following figure shows sorted relative improvements of Sail over Spark for each query.

Resource Utilization

We analyze memory and disk usage during query execution, using AWS CloudWatch metrics with 1-second resolution.

The following figure shows that Spark consumed about 54 GB of memory during query execution, and spilled to disk for shuffle operations. Despite of abundant available memory, Spark wrote over 110 GB of temporary data, peaking at over 46 GB in a rolling minute.

In contrast, the following figure shows drastically different resource consumption characteristics of Sail. At peak, Sail utilized approximately 22 GB of memory, but this usage lasted for only one second. Sail released memory after executing each query and had zero disk usage, relying solely on the available memory for computation.

Spark DataFrame API

Spark SQL

User-Defined Functions

Data Storage

Integrations

Deployment

Building Docker Images

Benchmark Results

Setup

Key Findings

Detailed Results

Query Time

Resource Utilization

Building Docker Images

Benchmark Results ​

Setup ​

Key Findings ​

Detailed Results ​

Query Time ​

Resource Utilization ​

Benchmark Results

Setup

Key Findings

Detailed Results

Query Time

Resource Utilization