Sessions
A session is a live, warm connection to a cluster — the runtime that interactive queries execute inside. Sessions expose Spark Connect over gRPC, so any Spark Connect client can connect: PySpark, the Scala client, and community Go and Rust clients. This page covers when to use a session, how to connect, and how to keep one healthy.
Prerequisites
- A cluster in
activestatus. See Set up a cluster. - A catalog to bind the session to (recommended — sessions take a default catalog so queries don't have to qualify table paths). See Connect a catalog.
- A Spark Connect client locally (PySpark 3.4+, the Scala client jar, or
spark-connect-go/spark-connect-rs).
When to use a session
Sessions are the right fit when you want an interactive, warm connection to a cluster:
- Ad-hoc exploration. Open a local Python script or notebook, connect via PySpark, and poke at your catalog tables. Fast iteration without committing anything to a job definition.
- Debugging a failed job. A batch run landed in
failedor got stuck inwaiting_for_sail. Attach an interactive session and re-run the same logic to see what the data actually looks like — much faster than editing and redeploying the job. - Authoring pipelines. Write and test transforms interactively in PySpark, then copy the working code into a job definition once it's stable. The session is your scratch space; the job is the reproducible output.
- Programmatic consumers. A Go or Python service (e.g. a dashboard backend, a data-quality check, an on-demand report generator) connects via spark-connect-go or PySpark and queries on request.
When not to use a session:
- Scheduled analytics or recurring dashboards — use a job. Sessions are connection-driven and don't carry the reproducibility, versioning, or run history a schedule needs.
- Enterprise BI tools — the mainstream options don't natively speak Spark Connect today. Materialize output into a warehouse with a scheduled job and point the BI tool at that instead.
The connection shape
Every session connects the same way:
- Endpoint —
sc://<platform-host>:8085(a shared gRPC proxy; the specific session is identified by a metadata header, not the URL). - Auth — a short-lived signed JWT bearer token scoped to one session.
- Session header —
x-session-id: <session-id>passed as gRPC metadata on every request.
Tokens expire. Plan to regenerate, not to hardcode.
1. Create the session
- Open Sessions in the sidebar and click Create session.
- Pick the Cluster and the Catalog the session should default to.
- Click Create. The session moves
pending → active.
Copy the Session ID from the detail page — you'll pass it as x-session-id.
2. Issue a token
From the session's detail page, click Generate token. The response includes the JWT and the gRPC endpoint. Tokens are shown once; if you lose one, generate a new token (the old one keeps working until it expires).
Only the session owner (or an org admin) can issue tokens for a session.
3. Connect from a Spark Connect client
PySpark
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(
"sc://<platform-host>:8085/;token=<jwt>;x-session-id=<session-id>"
).getOrCreate()
spark.sql("SHOW TABLES").show()Requires PySpark 3.4+ (pip install pyspark). The token= and x-session-id= URI parameters are threaded through as gRPC metadata automatically.
Spark Connect Scala
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.remote("sc://<platform-host>:8085/;token=<jwt>;x-session-id=<session-id>")
.getOrCreate()Requires the spark-connect-client-jvm jar.
Go
The Apache spark-connect-go client takes the same URI shape. Pass the token and session ID in the remote URL exactly as above.
Rust
Community Spark Connect clients for Rust are available (e.g. spark-connect-rs). They follow the same URI convention.
Keeping a session warm
Sessions transition to idle when unused and eventually closed to reclaim compute. The first query after idle has to warm compute back up (seconds, not minutes — but noticeable). For interactive use, the normal cadence of running queries keeps the session warm.
To close a session manually, click Close on the session detail page. Closed sessions can't be reopened — create a new one.
Troubleshooting
UNAUTHENTICATED/ token rejected — the token expired. Generate a new one.PERMISSION_DENIED— check that the caller is the session owner or an org admin; only they can issue tokens.- Queries hang on first run — the session is warming. Subsequent queries should be fast.
UNAVAILABLE/ connection refused — confirm you're using the platform host at port 8085. That's the only supported entry point; cluster-internal endpoints aren't directly reachable.
API reference
- Sessions — create, describe, close sessions;
IssueSessionTokenfor the JWT used by Spark Connect clients.