Defining jobs

A job is a reusable, versioned workload. This page covers the static definition — what a job is, the two execution shapes, and how drafts and versioning work.

For scheduling, see Scheduling. For runs and debugging, see Runs & debugging.

Prerequisites

Jobs run on a cluster against a catalog. Before you can save a job:

A cluster must be in active status. See Set up a cluster.
A catalog must be configured for the data the job reads. See Connect a catalog.

Create a job

Open Jobs in the sidebar and click Create job.
Give it a Name and assign it to a Team. Team membership determines who can edit and trigger the job.
Pick the Job type — SQL or Python — and fill in the source (below).
Configure compute — driver and executor sizing for the AWS node group the run will use.
(Optional) Configure LakeSail server settings (env vars, feature flags). See the Sail configuration reference.
Click Save.

Saving creates the first published version of the job.

SQL jobs

Use SQL for analytics and ETL transformations. There are two source variants:

Saved query — reference an existing query by ID. The query text is resolved at run time, so edits to the query flow through to the next run of the job.
Snapshot — paste inline SQL text into the job. You can optionally track the origin (e.g. a Git path) for drift detection.

(A file variant pointing at SQL in workspace storage is reserved in the API but not supported yet.)

Pick saved query when the SQL is shared across jobs or queried interactively; pick snapshot when the SQL belongs to exactly one job and shouldn't drift.

Python jobs

Use Python for anything that needs the Python ecosystem (ML, custom DataFrame logic, third-party libraries). Two source variants:

Python wheel — run a wheel by package name and entry point. Best for reusable code that's versioned in a package registry.
Python file — run a single .py file from cloud storage or the workspace. Best for scripts and ad-hoc work.

Compute

Jobs run on the cluster's compute nodes (not the management nodes). Configure:

Driver — instance type and count for the Spark driver.
Executor — instance type and count for Spark executors.

The cluster's autoscaler adds compute on demand, so over-provisioning isn't helpful — size for the job's actual working set.

Drafts and versioning

Every change to a live job flows through a draft, so you can iterate without disturbing scheduled runs.

Open the job and click Edit. Changes go into a draft, which does not fire on schedule and does not appear in the main run history.
Test the draft by running it manually. Draft runs are clearly labeled.
When ready, publish the draft. It becomes the new live version; the previous version moves to the version history.

The version history is searchable and every run links to the exact version it executed. You can always reproduce what ran last Tuesday because versions are frozen at dispatch time.

Roll back

To revert to an earlier version:

Open the job's Versions tab.
Find the version you want and click Activate.

The activated version becomes the live one on the next run (scheduled or manual).

Teams and ownership

Jobs have one owning team and can be shared with additional teams.

Members of the owning team (and org admins) can edit and delete the job.
Members of any shared team can trigger runs and view history.

Change ownership by editing the team field on the job. The previous team loses write access the moment the change is saved.

API reference

Jobs — CreateJob, UpdateJob, drafts, versions, team assignment.
Catalogs — required prerequisite for the job's data binding.

Defining jobs ​

Prerequisites ​

Create a job ​

SQL jobs ​

Python jobs ​

Compute ​

Drafts and versioning ​

Roll back ​

Teams and ownership ​

API reference ​