Building the Python Package
Running the Code Formatter and Linter
Run the following command to format the Python code. The command uses Ruff as the code formatter and linter.
hatch fmtYou may need to fix linter errors manually.
Building and Installing the Python Package
Run the following command to build and install the Python package for local development. The package is installed in the default Hatch environment.
hatch run maturin developThe command installs the source code as an editable package in the Hatch environment, while the built .so native library is stored in the source directory. You can then use hatch shell to enter the Python environment and test the library. Any changes to the Python code will be reflected in the environment immediately. But if you make changes to the Rust code, you need to run the develop command again.
Run the following command if you want to build the Python package without installing it. The built package will be available in the target/wheels directory.
hatch run maturin buildYou can install the built package in the default environment using the following command.
hatch run install-pysailRunning Tests
Use the following command to run the Python tests. This assumes that you have installed the editable package in the default Hatch environment.
hatch run pytestYou can pass additional pytest arguments as needed. For example, to run tests from the installed package, use the following command.
hatch run pytest --pyargs pysailTesting with an External Spark Connect Server
By default, a Sail Spark Connect server is launched in the same process as the tests. To run the tests against a server launched externally, set the SPARK_REMOTE environment variable.
env SPARK_REMOTE="sc://localhost:50051" hatch run pytestTesting with Different Spark Versions
The test matrix environments allow you to run tests against multiple Spark versions. Suppose the package is available in the target/wheels directory, you can install the package in all test environments using the following command.
hatch run test:install-pysailThen you can run the tests against multiple Spark versions. The tests are discovered from the installed package.
hatch run test:pytest --pyargs pysailAlternatively, you can run the tests against a specific Spark version. For example, you can use the following command to install the package in a specific test environment.
hatch run test.spark-3.5.5:install-pysailThen you can run the tests against the specific Spark version.
hatch run test.spark-3.5.5:pytest --pyargs pysailTesting with JVM Spark
You can also run the tests against a JVM-based Spark Connect server by specifying a local Spark remote URL. This is useful to ensure that the tests are written correctly to reflect the Spark behavior. Note that tests written for extended features of Sail will be skipped in this case.
env SPARK_REMOTE="local" \
PYSPARK_SUBMIT_ARGS="--packages org.apache.spark:spark-connect_2.12:3.5.5 pyspark-shell" \
hatch run test.spark-3.5.5:pytest --pyargs pysailThe PYSPARK_SUBMIT_ARGS environment variable is not needed in Spark 4.0.0 and later versions.
env SPARK_REMOTE="local" \
hatch run test.spark-4.0.0:pytest --pyargs pysailINFO
- You can use any valid local Spark master URLs such as
local,local[2], orlocal[*]. - If tests involving catalog operations fail, you may need to clean up the local Spark warehouse and metastore in the project directory.bash
rm -rf metastore_db spark-warehouse
