Skip to content

Sessions

A session is a live, warm connection to a cluster — the runtime that interactive queries execute inside. Sessions expose Spark Connect over gRPC, so any Spark Connect client can connect: PySpark, the Scala client, and community Go and Rust clients. This page covers when to use a session, how to connect, and how to keep one healthy.

Prerequisites

  • A cluster in active status. See Set up a cluster.
  • A catalog to bind the session to (recommended — sessions take a default catalog so queries don't have to qualify table paths). See Connect a catalog.
  • A Spark Connect client locally (PySpark 3.4+, the Scala client jar, or spark-connect-go / spark-connect-rs).

When to use a session

Sessions are the right fit when you want an interactive, warm connection to a cluster:

  • Ad-hoc exploration. Open a local Python script or notebook, connect via PySpark, and poke at your catalog tables. Fast iteration without committing anything to a job definition.
  • Debugging a failed job. A batch run landed in failed or got stuck in waiting_for_sail. Attach an interactive session and re-run the same logic to see what the data actually looks like — much faster than editing and redeploying the job.
  • Authoring pipelines. Write and test transforms interactively in PySpark, then copy the working code into a job definition once it's stable. The session is your scratch space; the job is the reproducible output.
  • Programmatic consumers. A Go or Python service (e.g. a dashboard backend, a data-quality check, an on-demand report generator) connects via spark-connect-go or PySpark and queries on request.

When not to use a session:

  • Scheduled analytics or recurring dashboards — use a job. Sessions are connection-driven and don't carry the reproducibility, versioning, or run history a schedule needs.
  • Enterprise BI tools — the mainstream options don't natively speak Spark Connect today. Materialize output into a warehouse with a scheduled job and point the BI tool at that instead.

The connection shape

Every session connects the same way:

  • Endpointsc://<platform-host>:8085 (a shared gRPC proxy; the specific session is identified by a metadata header, not the URL).
  • Auth — a short-lived signed JWT bearer token scoped to one session.
  • Session headerx-session-id: <session-id> passed as gRPC metadata on every request.

Tokens expire. Plan to regenerate, not to hardcode.

1. Create the session

  1. Open Sessions in the sidebar and click Create session.
  2. Pick the Cluster and the Catalog the session should default to.
  3. Click Create. The session moves pending → active.

Copy the Session ID from the detail page — you'll pass it as x-session-id.

2. Issue a token

From the session's detail page, click Generate token. The response includes the JWT and the gRPC endpoint. Tokens are shown once; if you lose one, generate a new token (the old one keeps working until it expires).

Only the session owner (or an org admin) can issue tokens for a session.

3. Connect from a Spark Connect client

PySpark

python
from pyspark.sql import SparkSession

spark = SparkSession.builder.remote(
    "sc://<platform-host>:8085/;token=<jwt>;x-session-id=<session-id>"
).getOrCreate()

spark.sql("SHOW TABLES").show()

Requires PySpark 3.4+ (pip install pyspark). The token= and x-session-id= URI parameters are threaded through as gRPC metadata automatically.

Spark Connect Scala

scala
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .remote("sc://<platform-host>:8085/;token=<jwt>;x-session-id=<session-id>")
  .getOrCreate()

Requires the spark-connect-client-jvm jar.

Go

The Apache spark-connect-go client takes the same URI shape. Pass the token and session ID in the remote URL exactly as above.

Rust

Community Spark Connect clients for Rust are available (e.g. spark-connect-rs). They follow the same URI convention.

Keeping a session warm

Sessions transition to idle when unused and eventually closed to reclaim compute. The first query after idle has to warm compute back up (seconds, not minutes — but noticeable). For interactive use, the normal cadence of running queries keeps the session warm.

To close a session manually, click Close on the session detail page. Closed sessions can't be reopened — create a new one.

Troubleshooting

  • UNAUTHENTICATED / token rejected — the token expired. Generate a new one.
  • PERMISSION_DENIED — check that the caller is the session owner or an org admin; only they can issue tokens.
  • Queries hang on first run — the session is warming. Subsequent queries should be fast.
  • UNAVAILABLE / connection refused — confirm you're using the platform host at port 8085. That's the only supported entry point; cluster-internal endpoints aren't directly reachable.

API reference

  • Sessions — create, describe, close sessions; IssueSessionToken for the JWT used by Spark Connect clients.