Azure
Sail supports reading from and writing to Azure storage services.
URI Formats
Sail supports the following URI formats for Azure storage services:
- Azure protocol
azure://container/path
- AZ protocol
az://container/path
- Azure Blob File System (ABFS) and secure ABFS (ABFSS) protocols
abfs[s]://container/path(fsspec convention)abfs[s]://container@account.dfs.core.windows.net/path(Hadoop driver convention)abfs[s]://container@account.dfs.fabric.windows.net/path(Hadoop driver convention with Microsoft Fabric)
- ADL protocol
adl://container/path
- HTTPS endpoints
https://account.dfs.core.windows.net/container/path(Azure Data Lake Storage Gen2)https://account.blob.core.windows.net/container/path(Azure Blob Storage)https://account.dfs.fabric.microsoft.com/workspace/pathhttps://account.blob.fabric.microsoft.com/workspace/pathhttps://onelake.dfs.fabric.microsoft.com/workspace/item.item-type/pathhttps://onelake.dfs.fabric.microsoft.com/workspace-guid/item-guid/path
Configuration
You can use environment variables to configure Azure storage services in Sail. Some configuration options can be set using different environment variables.
WARNING
The environment variables to configure Azure storage services are experimental and may change in future versions of Sail.
Security Configuration
AZURE_STORAGE_ACCOUNT_NAMEThe name of the Azure storage account, if not specified in the URI.
AZURE_STORAGE_ACCOUNT_KEY,AZURE_STORAGE_ACCESS_KEY,AZURE_STORAGE_MASTER_KEYThe master key for accessing the storage account.
AZURE_STORAGE_CLIENT_ID,AZURE_CLIENT_IDThe service principal client ID for authorizing requests.
AZURE_STORAGE_CLIENT_SECRET,AZURE_CLIENT_SECRETThe service principal client secret for authorizing requests.
AZURE_STORAGE_TENANT_ID,AZURE_STORAGE_AUTHORITY_ID,AZURE_TENANT_ID,AZURE_AUTHORITY_IDThe tenant ID used in OAuth flows.
AZURE_STORAGE_AUTHORITY_HOST,AZURE_AUTHORITY_HOSTThe authority host used in OAuth flows.
AZURE_STORAGE_SAS_KEY,AZURE_STORAGE_SAS_TOKENThe shared access signature (SAS) key or token. The signature should be percent-encoded.
AZURE_STORAGE_TOKENThe bearer token for authorizing requests.
AZURE_MSI_ENDPOINT,AZURE_IDENTITY_ENDPOINTThe endpoint for acquiring a managed identity token.
AZURE_OBJECT_IDThe object ID for use with managed identity authentication.
AZURE_MSI_RESOURCE_IDThe MSI resource ID for use with managed identity authentication.
AZURE_FEDERATED_TOKEN_FILEThe file containing a token for Azure AD workload identity federation.
AZURE_USE_AZURE_CLIWhether to use the Azure CLI for acquiring an access token.
AZURE_SKIP_SIGNATUREWhether to skip signing requests.
Storage Configuration
AZURE_CONTAINER_NAMEThe name of the Azure storage container, if not specified in the URI.
AZURE_STORAGE_ENDPOINT,AZURE_ENDPOINTThe endpoint used to communicate with blob storage. This overrides the default endpoint.
AZURE_USE_FABRIC_ENDPOINTWhether to use the Microsoft Fabric URL scheme.
AZURE_DISABLE_TAGGINGWhether to disable tagging objects.
AZURE_STORAGE_USE_EMULATOR,AZURE_USE_EMULATORWhether to use the Azurite storage emulator.
Microsoft Fabric Configuration
AZURE_FABRIC_TOKEN_SERVICE_URLThe URL for the Fabric token service.
AZURE_FABRIC_WORKLOAD_HOSTThe host for the Fabric workload.
AZURE_FABRIC_SESSION_TOKENThe session token for Fabric.
AZURE_FABRIC_CLUSTER_IDENTIFIERThe cluster identifier for Fabric.
INFO
For configuration options that accept boolean values, you can specify 1, true, on, yes, or y for a true value, and specify 0, false, off, no, or n for a false value. The boolean values are case-insensitive.
Examples
INFO
In the code below, spark refers to a Spark session connected to the Sail server. You can refer to the Getting Started guide for how it works.
Spark DataFrame API
# You can use any valid URI format for Azure storage services
# to specify the path to read or write data.
path = "azure://my-container/path/to/data"
df = spark.createDataFrame([(1, "Alice"), (2, "Bob")], schema="id INT, name STRING")
df.write.parquet(path)
df = spark.read.parquet(path)
df.show()Spark SQL
# You can use any valid URI format for Azure storage services
# to specify the location of the table.
sql = """
CREATE TABLE my_table (id INT, name STRING)
USING parquet
LOCATION 'azure://my-container/path/to/data'
"""
spark.sql(sql)
spark.sql("SELECT * FROM my_table").show()
spark.sql("INSERT INTO my_table VALUES (3, 'Charlie'), (4, 'David')")
spark.sql("SELECT * FROM my_table").show()