Building the Python Package
Running the Code Formatter and Linter
Run the following command to format the Python code. The command uses Ruff as the code formatter and linter.
hatch fmt
You may need to fix linter errors manually.
Building and Installing the Python Package
Run the following command to build and install the Python package for local development. The package is installed in the default
Hatch environment.
hatch run maturin develop
The command installs the source code as an editable package in the Hatch environment, while the built .so
native library is stored in the source directory. You can then use hatch shell
to enter the Python environment and test the library. Any changes to the Python code will be reflected in the environment immediately. But if you make changes to the Rust code, you need to run the develop
command again.
Run the following command if you want to build the Python package without installing it. The built package will be available in the target/wheels
directory.
hatch run maturin build
You can install the built package in the default
environment using the following command.
hatch run install-pysail
Running Tests
Use the following command to run the Python tests. This assumes that you have installed the editable package in the default
Hatch environment.
hatch run pytest
You can pass additional pytest
arguments as needed. For example, to run tests from the installed package, use the following command.
hatch run pytest --pyargs pysail
Testing with an External Spark Connect Server
By default, a Sail Spark Connect server is launched in the same process as the tests. To run the tests against a server launched externally, set the SPARK_REMOTE
environment variable.
env SPARK_REMOTE="sc://localhost:50051" hatch run pytest
Testing with Different Spark Versions
The test
matrix environments allow you to run tests against multiple Spark versions. Suppose the package is available in the target/wheels
directory, you can install the package in all test
environments using the following command.
hatch run test:install-pysail
Then you can run the tests against multiple Spark versions. The tests are discovered from the installed package.
hatch run test:pytest --pyargs pysail
Alternatively, you can run the tests against a specific Spark version. For example, you can use the following command to install the package in a specific test
environment.
hatch run test.spark-3.5.5:install-pysail
Then you can run the tests against the specific Spark version.
hatch run test.spark-3.5.5:pytest --pyargs pysail
Testing with JVM Spark
You can also run the tests against a JVM-based Spark Connect server by specifying a local
Spark remote URL. This is useful to ensure that the tests are written correctly to reflect the Spark behavior. Note that tests written for extended features of Sail will be skipped in this case.
env SPARK_REMOTE="local" \
PYSPARK_SUBMIT_ARGS="--packages org.apache.spark:spark-connect_2.12:3.5.5 pyspark-shell" \
hatch run test.spark-3.5.5:pytest --pyargs pysail
The PYSPARK_SUBMIT_ARGS
environment variable is not needed in Spark 4.0.0 and later versions.
env SPARK_REMOTE="local" \
hatch run test.spark-4.0.0:pytest --pyargs pysail
INFO
- You can use any valid local Spark master URLs such as
local
,local[2]
, orlocal[*]
. - If tests involving catalog operations fail, you may need to clean up the local Spark warehouse and metastore in the project directory.bash
rm -rf metastore_db spark-warehouse