Skip to content

Compute profiles

A compute profile is a saved answer to a question every workload asks: what machine should this run on, and how should it be set up? You pick the engine size, the cluster, and any libraries once, give it a name, and then point jobs, sessions, and notebooks at it — instead of configuring each one by hand.

In the UI, profiles live under Clusters → Compute Profiles. In the API they are called workload configs.

Why profiles exist

Jobs, sessions, and notebooks all need the same thing: a cluster to run on and a description of how big the Sail engine should be. Rather than repeat that on every workload, a profile captures it once so you can:

  • Standardize sizing. "small", "etl-large", "ml-gpu" become shared, named choices instead of per-workload guesses.
  • Change compute without touching the workload. Edit the profile; the next run of every job that references it picks up the new sizing.
  • Scope by team. A profile belongs to one team, so its cost and access follow the same lines as the rest of your resources.

Execution mode

Execution mode is the most important choice in a profile, because it changes the shape of the compute:

ModeShapeUse for
standalone (default)A single Sail pod. Query execution is still parallelized across threads inside that pod.Most interactive work and small-to-medium jobs. Lower overhead, faster to start.
clusterA driver pod plus separate worker pods that communicate over RPC.Large datasets that need distributed execution across multiple nodes.

Worker settings only apply in cluster mode. In standalone mode there are no workers, so only the driver sizing matters.

Sizing compute

On a LakeSail-managed cluster you configure:

  • Driver — the EC2 instance type (e.g. c8g.2xlarge) and EBS volume for the coordinating pod.
  • Worker (cluster mode only) — instance type, EBS volume, max nodes (the autoscaler grows up to this), and capacity type (on-demand or spot).

The cluster's autoscaler adds and removes worker nodes on demand, so size for the workload's actual working set rather than its peak — over-provisioning the max just raises the ceiling, it doesn't reserve capacity.

External clusters

If the profile targets a cluster that LakeSail doesn't manage, execution mode and instance types aren't available. The workload runs on whatever resources the external cluster already has, and the profile carries only the engine-level settings (libraries, env, retries).

Everything a profile holds

The two sections above cover what most people set. The full field list:

  • Name — how it shows up in the picker when you attach a profile to a workload.
  • Team — the owning team. Immutable after creation.
  • Cluster — the cluster the profile targets. Immutable after creation; a profile is tied to one cluster.
  • Execution modestandalone or cluster. See Execution mode.
  • Max retries — how many times to retry an unsuccessful run. 0 (default) means no retries; -1 means retry indefinitely.
  • Compute — driver and worker sizing. See Sizing compute.
  • Libraries — PyPI packages, a requirements list, or a wheel URI to install before execution.
  • Environment variables — extra env passed to the Sail engine. Keys are the Sail configuration reference keys.
  • Endpoint limits (sessions) — how many clients can attach to one session at the same time over the Spark Connect endpoint. Defaults to 1 (one client per session); raise it to share a session across several clients.
  • Image URI (advanced) — override the Sail container image. Leave empty to use the platform default.

Creating a profile

Create a profile two ways:

  • From Clusters → Compute Profiles → Add Compute Profile.
  • Inline with Create profile while creating a job, session, or notebook, without leaving the form.

Either way, the profile is saved to its team and can be reused and shared.

How workloads use a profile

  • Jobs require a compute profile — every job version references one, and the run inherits its cluster, sizing, and engine settings. Changing the profile changes the next run without a job redeploy. See Defining jobs.
  • Sessions take a profile when created. The endpoint-limit field caps concurrent Spark Connect connections. See Sessions.
  • Notebooks reference a profile for the Sail pod behind the notebook. It's editable only while the notebook is stopped. See Notebooks.

Editing and deleting

  • Editable at any time: name, compute, libraries, env, endpoint limits, retries, image.
  • Immutable: team, cluster, and workload type. To move a profile to a different cluster or team, create a new one.
  • A profile in use by a job, session, or notebook can't be deleted out from under it; detach or delete the dependent workloads first.

API reference

  • Sessions: CreateWorkloadConfig, ListOrgWorkloadConfigs, DescribeWorkloadConfig — workload configs are the API name for compute profiles.
  • Jobs: job versions reference a workload config for their compute.
  • Clusters: the cluster a profile targets.

Can't find the answer here? Email us: support@lakesail.com