Hive Metastore
The Hive Metastore catalog provider in Sail allows you to connect to an external Hive Metastore service over Thrift.
Support Status
Sail's HMS integration is currently aimed at metadata interoperability with Apache Hive Metastore deployments.
The following areas are supported:
- Plain HMS connections over Thrift.
- Kerberos-secured HMS connections over Thrift SASL.
- HMS high-availability URI lists with endpoint failover.
- Flat database namespaces.
- Database, table, and view metadata stored in HMS.
- Generic Hive storage formats:
parquet,csv,textfile,json,orc,avro, anddeltawith the aliasdeltalake.
The following areas are not implemented yet:
- Hive ACID or transactional HMS APIs such as transaction heartbeats, locks, or write ID allocation.
- Iceberg-in-HMS behavior.
- Delegation-token authentication.
Hive Metastore can be configured using the following options:
type(required): The stringhive_metastoreor the aliashms.name(required): The name of the catalog.uris(required): A list of HMS endpoints. Each entry accepts eitherhost:portorthrift://host:port. Entries may also include comma-separated endpoint lists.thrift_transport(optional): The Thrift transport mode. Valid values arebufferedandframed. The default isbuffered.auth(optional): The HMS authentication mode. Valid values arenoneandkerberos. The default isnone.kerberos_service_principal(optional): Required whenauth = "kerberos". Use the HMS service principal in the formservice/_HOST@REALM, for examplehive-metastore/_HOST@EXAMPLE.COM.sasl_qop_min(optional): Minimum Kerberos SASL QOP whenauth = "kerberos". Valid values areauth,auth_int, andauth_conf. The default isauth.connect_timeout_secs(optional): Per-endpoint connect timeout in seconds. The default is5.
Failover behavior:
- Sail attempts endpoints in configured order.
- New connections re-resolve DNS for the selected endpoint instead of pinning the initial startup address forever.
- Retryable transport/Thrift failures rotate to the next endpoint.
- A retried create or drop normalizes
AlreadyExistsandNotFoundresponses when the prior attempt likely succeeded but the response was lost. - Per-endpoint connect timeout defaults to
5sand can be overridden withconnect_timeout_secs.
Kerberos Authentication
INFO
Kerberos authentication for Hive Metastore is supported and uses the same operator model as Sail's HDFS support.
Prerequisites
- A Kerberos-enabled Hive Metastore service.
- A valid
krb5.conffile on the Sail server host. - A valid Kerberos ticket cache for the Sail server process.
- Kerberos runtime libraries on the Sail server host. On Linux Sail loads
libgssapi_krb5.so.2at runtime. On macOS install Kerberos libraries, for example withbrew install krb5.
Starting the Sail Server
Authenticate with Kerberos before starting the Sail server.
import subprocess
from pysail.spark import SparkConnectServer
# authenticate with Kerberos
subprocess.run([
"kinit", "-kt",
"/path/to/user.keytab",
"username@YOUR.REALM"
], check=True)
# start the Sail server
server = SparkConnectServer(ip="0.0.0.0", port=50051)
server.start(background=False)TIP
The Sail server uses the process ticket cache created by kinit.
If you run Sail in a distributed environment, each worker needs its own Kerberos credentials.
Kerberos HMS Catalog Configuration
When auth = "kerberos" is enabled, Sail expands _HOST in kerberos_service_principal from the hostname of the endpoint selected for that connection attempt.
export SAIL_CATALOG__LIST='[{type="hms", name="sail", uris=["hms1.internal:9083","thrift://hms2.internal:9083"], auth="kerberos", kerberos_service_principal="hive-metastore/_HOST@EXAMPLE.COM"}]'Security Guarantees
- Downgrade fail-fast: if
sasl_qop_mincannot be satisfied by the server-advertised SASL layers, connection setup fails immediately. - Session-wide protection: once a wrapped QOP (
auth_intorauth_conf) is negotiated, every Thrift frame for that connection is wrapped/unwrapped through the Kerberos SASL security layer.
Current Limitations
- Sail uses an existing Kerberos ticket cache. It does not run
kinitor manage keytabs internally. - Delegation-token authentication is not supported.
- Transactional Hive Metastore APIs are not used yet. Sail currently targets metadata CRUD rather than Hive ACID write coordination.
Examples
export SAIL_CATALOG__LIST='[{type="hive_metastore", name="sail", uris=["127.0.0.1:9083"]}]'
export SAIL_CATALOG__LIST='[{type="hms", name="sail", uris=["hms1.internal:9083","hms2.internal:9083"], thrift_transport="framed", connect_timeout_secs=10}]'
export SAIL_CATALOG__LIST='[{type="hms", name="sail", uris=["hms.internal:9083"], auth="kerberos", kerberos_service_principal="hive-metastore/_HOST@EXAMPLE.COM", sasl_qop_min="auth_int", thrift_transport="framed"}]'