Skip to content

Set up a cluster

A cluster is a Kubernetes cluster LakeSail provisions inside your network. It's where jobs and sessions actually run. This page covers creating one, sizing it sensibly, reconfiguring it later, and destroying it when you're done.

For a full first-time walkthrough (cloud account → network → cluster), start with the Quickstart. This page is the standalone reference.

Prerequisites

  • A connected cloud account in active status.
  • A network in active status, in the region you want the cluster.

Create a cluster

  1. Open Clusters in the sidebar and click Create cluster.
  2. Fill in:
    • Network — the network the cluster lives in. The cluster inherits its cloud account and region from the network.
    • Cluster Name — a human-readable identifier. Cannot be changed after creation.
    • Management Node Size — the instance type for the cluster's control plane / system nodes. Default m8g.large is right for most starts.
    • Min / Desired / Max Management Nodes — autoscaling bounds for management nodes. Default 1 / 2 / 3 gives basic HA with room to grow.
    • Disk Size (GB) — EBS volume per management node. Default 100 is fine.
    • Allowed IPs List (optional) — see Change who can reach the cluster below. Leave empty for private-only access.
  3. Click Create Cluster.

The cluster moves through pending → provisioning → active. Provisioning takes several minutes — Karpenter, networking, the Sail control plane, and add-ons all bootstrap during this phase.

Compute vs. management nodes

The sizes above are for management nodes (system) only. They run the control plane, autoscaler, and LakeSail's own services. Compute nodes for your jobs and sessions are picked per-workload, so you don't need to over-provision the cluster's defaults to handle a rare big job.

Pick sensible defaults

For a first cluster (evaluation, small team) the defaults are deliberately conservative:

SettingDefaultWhen to change
Management Node Sizem8g.largeLarger only if you hit pod-capacity warnings
Min Management Nodes1Raise to 2+ for production HA
Desired Management Nodes2Two nodes give basic HA
Max Management Nodes3Raise if the autoscaler keeps hitting the ceiling
Disk Size100 GBRaise if you see disk pressure on system pods

The validation modal warns you in real time if your settings won't fit:

  • "t-shirt-size supports only N pods per node…" — the instance is too small for LakeSail's system DaemonSets. Pick a larger one.
  • "t-shirt-size with N minimum node(s) provides ~Xm allocatable CPU…" — not enough CPU for system controllers. Either increase min nodes or pick a larger instance.
  • "Min nodes cannot exceed max nodes." / "Desired nodes cannot exceed max nodes." — fix the ordering.

See Troubleshooting for the full list.

Resize an existing cluster

Most properties can be changed in place. The cluster moves to updating and stays available for existing workloads where possible.

You can edit:

  • Management node size (instance type)
  • Min / Desired / Max management nodes
  • Disk size (GB)
  • Allowed IPs list

You cannot change (these require creating a new cluster):

  • The network the cluster lives in
  • The cluster name
  • The region (set by the network)

To resize:

  1. Open Clusters and click into the one you want to change.
  2. Click Edit.
  3. Adjust any of the editable fields. The same validation warnings as create apply.
  4. Click Update Cluster.

The cluster moves active → updating → active. Management-node churn is invisible to your jobs and sessions.

Change who can reach the cluster

The Allowed IPs list controls who can reach the cluster endpoint from outside the VPC.

  • Empty (default) — private-only. Only traffic from inside the VPC can reach the endpoint.
  • Comma-separated CIDRs — e.g. 203.0.113.0/24, 10.0.0.0/8 — the public endpoint is restricted to listed IPs. VPC-internal traffic always uses the private endpoint.

You can toggle this on existing clusters without recreating them — it's a property of the access policy, not the underlying compute.

When to create a new cluster instead

If you're making a change the editor won't accept — region, network, name — or experimenting with a significantly different configuration, create a new cluster alongside the old one. Move workloads over (re-create sessions, re-target jobs), then destroy the original.

Destroy a cluster

  1. Open the cluster detail page.
  2. Click Destroy (or Delete infrastructure).
  3. Confirm. The cluster moves destroying → destroyed, releasing the underlying AWS resources.

Destroying is irreversible

Destroying a cluster terminates any running jobs and sessions on it. Job run history, catalog registrations, and team assignments stay in the organization, but the compute itself is gone. Budget a few minutes for teardown to complete.

Lifecycle reference

StatusMeaning
pendingCreation accepted, queued
provisioningAWS resources being created
activeUp and accepting workloads
updatingA configuration change is rolling out
failedProvisioning or update failed — see the progress panel for the failed stage
destroyingTeardown in progress
destroyedAll resources released; record retained for audit
deletedRecord removed from the organization

API reference

  • ClustersCreateExternalCluster, UpdateExternalCluster, DeleteExternalCluster, and the full lifecycle.
  • Networks — required prerequisite.
  • Cloud accounts — required prerequisite.