Skip to content

Glossary

Short definitions of every LakeSail term used across the docs. Each entry links to the page where the concept is discussed in depth.

A

Account type. A property of a member: managed (the org controls the user's profile — can reset password, force MFA, deactivate) or external (the user self-manages — for consultants and contractors). See Members.

Active (status). Healthy, accepting work. Applies to cloud accounts, networks, clusters, catalogs, and sessions.

Agentic session. A session driven by an LLM agent rather than a human. Planned, not yet shipped. See Concepts.

Authorization policy. A fine-grained grant of specific permissions to a specific principal (member or team). The escape hatch when a role doesn't fit. See Roles & permissions.

B

Bring-your-own VPC. Connecting LakeSail to a VPC you already manage. Not currently supported in the UI; LakeSail provisions networks for you. See Quickstart.

C

Catalog. A pointer to your data — to a Glue catalog, an Iceberg REST endpoint, Unity, OneLake, etc. The catalog's job is to tell LakeSail how to discover and read tables; it never copies data. See Connect a catalog.

Catalog (provisioned). A catalog service LakeSail provisions inside your cloud account, vs. one you configure pointing at an existing service.

Channel (notification). A delivery destination for notifications: email, Slack, webhook, PagerDuty, or Rootly. See Notifications.

Cloud account. A trust relationship between LakeSail and an AWS account you own — concretely, an IAM role with a scoped trust policy that LakeSail can assume on demand. See Security & IAM.

Cluster. A Kubernetes cluster LakeSail provisions inside a network. Where jobs and sessions actually run. Has management nodes (system) and compute nodes (workload). See Set up a cluster.

Compute nodes. Cluster nodes that run your jobs and sessions. Picked per-workload, scaled by Karpenter on demand. Distinct from management nodes.

Concurrency policy. What happens when a scheduled tick fires while a previous run hasn't finished: skip, allow (with a max), or replace. See Scheduling.

Cron expression. A standard 5-field cron string (e.g. 0 2 * * *) defining when a scheduled job fires. See Scheduling.

D

Draft. An in-progress version of a job. Edits go into a draft so the live version keeps running unchanged; publishing the draft makes it the new live version. See Defining jobs.

Driver / Executor. Spark roles. The driver coordinates a job; executors run the actual work in parallel. Both are configured per-job. See Defining jobs.

E

External ID. A random secret embedded in the cloud account's IAM trust policy. LakeSail must present it to assume the role. Prevents confused-deputy attacks. See Security & IAM.

I

IdP (Identity Provider). An external service (Okta, Entra ID, Google Workspace) that authenticates users via SSO. See Single sign-on.

Idle (session). A session with no active client traffic. Idle sessions don't immediately free their compute; they eventually close to reclaim resources. See Sessions.

Invitation. A time-limited invite link that turns a user into a member of an organization. See Invite teammates.

J

Job. A reusable, versioned workload — SQL or Python — that runs on a cluster against a catalog. See Defining jobs.

Job run. A single execution of a job. Immutable; re-running creates a new run. See Runs & debugging.

M

Management nodes. Cluster nodes that run the control plane and LakeSail's own services. Sized once at cluster creation. Distinct from compute nodes.

Member. The org-scoped link between a user and an organization. Permissions, ownership, and audit attach to the member. See Members.

Memory catalog. An ephemeral, in-memory catalog. Useful for testing and one-off workloads.

MFA (Multi-Factor Authentication). Time-based one-time codes from an authenticator app. Can be required at the org level. See MFA.

Missed-schedule policy. What happens if scheduled ticks were missed (cluster down, platform incident): latest fires once on recovery, all backfills every missed tick. See Scheduling.

N

Network. A VPC LakeSail provisions inside a cloud account. Stable boundary that hosts clusters. One cloud account → many networks. See Quickstart.

Notification rule. Connects an event type (e.g. job_run.status.failed) at a given scope (resource/team/organization/own) to one or more channels. See Notifications.

O

Operation (long-running). An async backend operation (provisioning, destroying) tracked by an operation ID. Surfaced in the UI as progress bars.

Organization. The tenant — every other object in LakeSail belongs to exactly one. See Concepts.

Organization role. A pre-defined permission bundle that applies across the entire org. The catalog is fixed by the platform; you assign existing roles, you don't create new ones. See Roles & permissions.

Owning team. The single team that "owns" a resource (job, query, catalog). Determines who can edit and re-share. Distinct from teams a resource is shared with.

P

Pause. A job state. The definition is intact; scheduled runs don't fire. Manual runs still work. The first thing to reach for when a job misbehaves.

Permission. A relation like CanManageUsers that grants the ability to perform a specific action on a resource type. Roles and policies are bundles of permissions. See Roles & permissions.

Permissions boundary. A managed IAM policy that caps the effective permissions of LakeSail's role in your AWS account, even if a sub-policy is mis-scoped. See Security & IAM.

Q

Query. A saved SQL statement bound to a catalog. Reusable from sessions (interactive) and jobs (batch). See Queries.

R

Role. A bundle of permissions. LakeSail has organization roles and team roles; both are pre-defined catalogs.

Run (job). See Job run.

S

Saved query. Same as Query.

Session. A live, warm gRPC connection to a cluster, exposing Spark Connect. Where queries and ad-hoc Python code run interactively. See Sessions.

Shared (resource). A team a resource is shared with (in addition to the owning team). Shared teams typically get read/run access without ownership rights.

Snapshot (job source). Inline SQL stored directly on the job, vs. a reference to a saved query. Pinning to a snapshot prevents drift when the underlying query changes. See Defining jobs.

Spark Connect. The gRPC protocol LakeSail sessions speak. PySpark 3.4+, the Scala client, and Go/Rust clients all support it. See Sessions.

T

Team. A group of members with shared permissions and shared resource ownership. The unit of access control past the first few people in an org. See Teams.

Team role. A permission bundle scoped to one team. The same member can hold different team roles on different teams. See Roles & permissions.

Token (session). A short-lived JWT that authenticates a Spark Connect client to a session. Issued by the session owner or an admin. See Sessions.

Trust policy. The portion of an IAM role definition that says who can assume it. LakeSail's role is restricted to a specific principal plus the external ID. See Security & IAM.

U

User. A global identity in LakeSail (one email, one set of credentials). Distinct from member, which is the org-scoped link.

V

Version (job). A frozen snapshot of a job's definition. Every published edit creates a new version; runs pin to the version active at dispatch time. See Defining jobs.

W

Workload boundary. The S3-scope managed policy that caps which buckets LakeSail-provisioned workloads can read and write. Buckets matching lakesail-<connection-id>-* only. See Security & IAM.

Webhook signature. An HMAC-SHA256 over the raw request body, sent in the LakeSail-Signature: sha256=... header. Lets you verify a webhook actually came from LakeSail. See Notifications.