Set up a cluster
A cluster is a Kubernetes cluster LakeSail provisions inside your network. It's where jobs and sessions actually run. This page covers creating one, sizing it sensibly, reconfiguring it later, and destroying it when you're done.
For a full first-time walkthrough (cloud account → network → cluster), start with the Quickstart. This page is the standalone reference.
Prerequisites
- A connected cloud account in
activestatus. - A network in
activestatus, in the region you want the cluster.
Create a cluster
- Open Clusters in the sidebar and click Create cluster.
- Fill in:
- Network — the network the cluster lives in. The cluster inherits its cloud account and region from the network.
- Cluster Name — a human-readable identifier. Cannot be changed after creation.
- Management Node Size — the instance type for the cluster's control plane / system nodes. Default
m8g.largeis right for most starts. - Min / Desired / Max Management Nodes — autoscaling bounds for management nodes. Default
1 / 2 / 3gives basic HA with room to grow. - Disk Size (GB) — EBS volume per management node. Default
100is fine. - Allowed IPs List (optional) — see Change who can reach the cluster below. Leave empty for private-only access.
- Click Create Cluster.
The cluster moves through pending → provisioning → active. Provisioning takes several minutes — Karpenter, networking, the Sail control plane, and add-ons all bootstrap during this phase.
Compute vs. management nodes
The sizes above are for management nodes (system) only. They run the control plane, autoscaler, and LakeSail's own services. Compute nodes for your jobs and sessions are picked per-workload, so you don't need to over-provision the cluster's defaults to handle a rare big job.
Pick sensible defaults
For a first cluster (evaluation, small team) the defaults are deliberately conservative:
| Setting | Default | When to change |
|---|---|---|
| Management Node Size | m8g.large | Larger only if you hit pod-capacity warnings |
| Min Management Nodes | 1 | Raise to 2+ for production HA |
| Desired Management Nodes | 2 | Two nodes give basic HA |
| Max Management Nodes | 3 | Raise if the autoscaler keeps hitting the ceiling |
| Disk Size | 100 GB | Raise if you see disk pressure on system pods |
The validation modal warns you in real time if your settings won't fit:
- "t-shirt-size supports only N pods per node…" — the instance is too small for LakeSail's system DaemonSets. Pick a larger one.
- "t-shirt-size with N minimum node(s) provides ~Xm allocatable CPU…" — not enough CPU for system controllers. Either increase min nodes or pick a larger instance.
- "Min nodes cannot exceed max nodes." / "Desired nodes cannot exceed max nodes." — fix the ordering.
See Troubleshooting for the full list.
Resize an existing cluster
Most properties can be changed in place. The cluster moves to updating and stays available for existing workloads where possible.
You can edit:
- Management node size (instance type)
- Min / Desired / Max management nodes
- Disk size (GB)
- Allowed IPs list
You cannot change (these require creating a new cluster):
- The network the cluster lives in
- The cluster name
- The region (set by the network)
To resize:
- Open Clusters and click into the one you want to change.
- Click Edit.
- Adjust any of the editable fields. The same validation warnings as create apply.
- Click Update Cluster.
The cluster moves active → updating → active. Management-node churn is invisible to your jobs and sessions.
Change who can reach the cluster
The Allowed IPs list controls who can reach the cluster endpoint from outside the VPC.
- Empty (default) — private-only. Only traffic from inside the VPC can reach the endpoint.
- Comma-separated CIDRs — e.g.
203.0.113.0/24, 10.0.0.0/8— the public endpoint is restricted to listed IPs. VPC-internal traffic always uses the private endpoint.
You can toggle this on existing clusters without recreating them — it's a property of the access policy, not the underlying compute.
When to create a new cluster instead
If you're making a change the editor won't accept — region, network, name — or experimenting with a significantly different configuration, create a new cluster alongside the old one. Move workloads over (re-create sessions, re-target jobs), then destroy the original.
Destroy a cluster
- Open the cluster detail page.
- Click Destroy (or Delete infrastructure).
- Confirm. The cluster moves
destroying → destroyed, releasing the underlying AWS resources.
Destroying is irreversible
Destroying a cluster terminates any running jobs and sessions on it. Job run history, catalog registrations, and team assignments stay in the organization, but the compute itself is gone. Budget a few minutes for teardown to complete.
Lifecycle reference
| Status | Meaning |
|---|---|
pending | Creation accepted, queued |
provisioning | AWS resources being created |
active | Up and accepting workloads |
updating | A configuration change is rolling out |
failed | Provisioning or update failed — see the progress panel for the failed stage |
destroying | Teardown in progress |
destroyed | All resources released; record retained for audit |
deleted | Record removed from the organization |
API reference
- Clusters —
CreateExternalCluster,UpdateExternalCluster,DeleteExternalCluster, and the full lifecycle. - Networks — required prerequisite.
- Cloud accounts — required prerequisite.