Defining jobs
A job is a reusable, versioned workload. This page covers the static definition: what a job is, the two execution shapes, and how drafts and versioning work.
For scheduling, see Scheduling. For runs and debugging, see Runs & debugging.
Prerequisites
Jobs run on a cluster against a catalog. Before you can save a job:
- A cluster must be in
activestatus. See Set up a cluster. - A catalog must be configured for the data the job reads. See Connect a catalog.
Create a job
- Open Jobs in the sidebar and click Create job.
- Give it a Name and assign it to a Team. Team membership determines who can edit and trigger the job.
- Pick the Job type (SQL or Python) and fill in the source (below).
- Configure compute: driver and executor sizing for the AWS node group the run will use.
- (Optional) Configure LakeSail server settings (env vars, feature flags). See the Sail configuration reference.
- Click Save.
Saving creates the first published version of the job.
SQL jobs
Use SQL for analytics and ETL transformations. There are two source variants:
- Saved query: reference an existing query by ID. The query text is resolved at run time, so edits to the query flow through to the next run of the job.
- Snapshot: paste inline SQL text into the job. You can optionally track the origin (e.g. a Git path) for drift detection.
A file variant pointing at SQL in workspace storage is reserved in the API but not supported yet.
Pick saved query when the SQL is shared across jobs or queried interactively; pick snapshot when the SQL belongs to exactly one job and shouldn't drift.
Python jobs
Use Python when you need the Python ecosystem, such as ML or third-party libraries. Two source variants:
- Python wheel: run a wheel by package name and entry point. Best for reusable code that's versioned in a package registry.
- Python file: run a single
.pyfile from cloud storage or the workspace. Best for scripts and ad-hoc work.
Compute
Jobs run on the cluster's compute nodes (not the management nodes), sized through a compute profile — a named, reusable bundle of engine and sizing settings. Every job references one; pick an existing profile or create one inline. The profile sets:
- Execution mode:
standalone(single Sail pod) orcluster(distributed driver + workers). - Driver: instance type and disk for the coordinating pod.
- Worker (cluster mode): instance type, disk, max nodes, and capacity type for the Spark workers.
- Libraries and environment: PyPI/wheel dependencies and Sail config to install before the run.
The cluster's autoscaler adds compute on demand, so over-provisioning provides no benefit. Size the profile for the job's actual working set. Because the compute lives in the profile, editing the profile changes the next run of every job that references it — without redeploying the jobs. Timeouts and resource ceilings are per-job; see Limits & quotas for what the platform enforces.
Drafts and versioning
Every change to a live job flows through a draft, so you can iterate without disturbing scheduled runs.
- Open the job and click Edit. Changes go into a draft. Drafts do not fire on schedule and are excluded from the main run history.
- Test the draft by running it manually. Draft runs are clearly labeled.
- When ready, publish the draft. It becomes the new live version, and the previous version moves to the version history.
The version history is searchable and every run links to the exact version it executed. Versions are frozen at dispatch time, so you can reproduce any past run.
Roll back
To revert to an earlier version:
- Open the job's Versions tab.
- Find the version you want and click Activate.
The activated version becomes the live one on the next run (scheduled or manual).
Teams and ownership
Jobs have one owning team and can be shared with additional teams.
- Members of the owning team (and org admins) can edit and delete the job.
- Members of any shared team can trigger runs and view history.
Change ownership by editing the team field on the job. The previous team loses write access the moment the change is saved.