Catalog

Sail supports various catalog providers to manage your datasets as external tables. Catalogs help organize and maintain metadata about your data, so that you can refer to them by table names in your SQL queries.

By default, Sail uses a memory catalog provider that stores table metadata in memory for the duration of your session. You can configure remote catalog providers to persist your table metadata across sessions. This is done using the Sail configuration options.

For example, you can configure memory catalogs using the catalog.list option and set the default catalog using the catalog.default_catalog option. The configuration can be done via environment variables before starting the Sail server.

bash

export SAIL_CATALOG__LIST='[{name="c1", type="memory", initial_database=["default"]}, {name="c2", type="iceberg-rest", uri="https://catalog.example.com"}]'
export SAIL_CATALOG__DEFAULT_CATALOG="c1"

Then you can interact with the catalogs using the Spark API.

INFO

In the code below, spark refers to a Spark client session connected to the Sail server. You can refer to the Getting Started guide for how it works.

python

spark.catalog.listCatalogs()
spark.catalog.currentCatalog()
spark.catalog.listTables()
spark.catalog.setCurrentCatalog("c2")

You can also interact with catalogs using SQL statements.

sql

-- show the current catalog
SELECT current_catalog()
-- show tables in the current catalog
SHOW TABLES

In the next few pages, we will explore the different catalog providers supported by Sail and how to configure them.

Support Matrix

Here is a list of the supported (✅) catalog providers and the ones that are planned in our roadmap (🚧).

Catalog Provider	Supported
Memory	✅
Iceberg REST	✅
Unity Catalog	✅
AWS Glue	🚧
Hive Metastore	🚧

Spark DataFrame API

Data Types

Spark SQL

Data Types

Literals

Functions and Operators

User-Defined Functions

Data Formats

Data Storage

Catalog

Integrations

Deployment

Building Docker Images

Catalog

Support Matrix

Data Types

Data Types

Literals

Building Docker Images

Catalog ​

Support Matrix ​

Catalog

Support Matrix