Compute profiles
A compute profile is a saved answer to a question every workload asks: what machine should this run on, and how should it be set up? You pick the engine size, the cluster, and any libraries once, give it a name, and then point jobs, sessions, and notebooks at it — instead of configuring each one by hand.
In the UI, profiles live under Clusters → Compute Profiles. In the API they are called workload configs.
Why profiles exist
Jobs, sessions, and notebooks all need the same thing: a cluster to run on and a description of how big the Sail engine should be. Rather than repeat that on every workload, a profile captures it once so you can:
- Standardize sizing. "small", "etl-large", "ml-gpu" become shared, named choices instead of per-workload guesses.
- Change compute without touching the workload. Edit the profile; the next run of every job that references it picks up the new sizing.
- Scope by team. A profile belongs to one team, so its cost and access follow the same lines as the rest of your resources.
Execution mode
Execution mode is the most important choice in a profile, because it changes the shape of the compute:
| Mode | Shape | Use for |
|---|---|---|
standalone (default) | A single Sail pod. Query execution is still parallelized across threads inside that pod. | Most interactive work and small-to-medium jobs. Lower overhead, faster to start. |
cluster | A driver pod plus separate worker pods that communicate over RPC. | Large datasets that need distributed execution across multiple nodes. |
Worker settings only apply in cluster mode. In standalone mode there are no workers, so only the driver sizing matters.
Sizing compute
On a LakeSail-managed cluster you configure:
- Driver — the EC2 instance type (e.g.
c8g.2xlarge) and EBS volume for the coordinating pod. - Worker (cluster mode only) — instance type, EBS volume, max nodes (the autoscaler grows up to this), and capacity type (on-demand or spot).
The cluster's autoscaler adds and removes worker nodes on demand, so size for the workload's actual working set rather than its peak — over-provisioning the max just raises the ceiling, it doesn't reserve capacity.
External clusters
If the profile targets a cluster that LakeSail doesn't manage, execution mode and instance types aren't available. The workload runs on whatever resources the external cluster already has, and the profile carries only the engine-level settings (libraries, env, retries).
Everything a profile holds
The two sections above cover what most people set. The full field list:
- Name — how it shows up in the picker when you attach a profile to a workload.
- Team — the owning team. Immutable after creation.
- Cluster — the cluster the profile targets. Immutable after creation; a profile is tied to one cluster.
- Execution mode —
standaloneorcluster. See Execution mode. - Max retries — how many times to retry an unsuccessful run.
0(default) means no retries;-1means retry indefinitely. - Compute — driver and worker sizing. See Sizing compute.
- Libraries — PyPI packages, a
requirementslist, or a wheel URI to install before execution. - Environment variables — extra env passed to the Sail engine. Keys are the Sail configuration reference keys.
- Endpoint limits (sessions) — how many clients can attach to one session at the same time over the Spark Connect endpoint. Defaults to
1(one client per session); raise it to share a session across several clients. - Image URI (advanced) — override the Sail container image. Leave empty to use the platform default.
Creating a profile
Create a profile two ways:
- From Clusters → Compute Profiles → Add Compute Profile.
- Inline with Create profile while creating a job, session, or notebook, without leaving the form.
Either way, the profile is saved to its team and can be reused and shared.
How workloads use a profile
- Jobs require a compute profile — every job version references one, and the run inherits its cluster, sizing, and engine settings. Changing the profile changes the next run without a job redeploy. See Defining jobs.
- Sessions take a profile when created. The endpoint-limit field caps concurrent Spark Connect connections. See Sessions.
- Notebooks reference a profile for the Sail pod behind the notebook. It's editable only while the notebook is stopped. See Notebooks.
Editing and deleting
- Editable at any time: name, compute, libraries, env, endpoint limits, retries, image.
- Immutable: team, cluster, and workload type. To move a profile to a different cluster or team, create a new one.
- A profile in use by a job, session, or notebook can't be deleted out from under it; detach or delete the dependent workloads first.