Python Application Packaging

This is a follow-up to the Kafka Data Processing in Python tutorial. It demonstrates how to package a Python application with dependencies and submit it to LakeSail.

We use Poetry for Python dependency management and packaging.

Preparing the Project

Please clone the project repository and start the Docker Compose services by following the Kafka Data Processing in Python tutorial.

Let's first take a look at the pyproject.toml in the flink-samples/kafka-python project directory. The following sections are important in particular.

toml

[tool.poetry.dependencies]
python = "~3.10"
jinja2 = "^3.1.2"

[tool.poetry.group.dev.dependencies]
isort = "^5.12.0"
black = "^23.9.1"
flake8 = "^6.1.0"
apache-flink = "1.17.1"

You can see that jinja2 is defined as a dependency, while apache-flink is defined as a dev dependency. This is helpful when packaging dependencies as a ZIP file, since we want to exclude the PyFlink library provided by the Flink Docker image already.

Building and Deploying Artifacts

Run poetry build in the project directory. This will generate build artifacts in the dist directory.
Run the following commands to create a ZIP file containing the dependencies of the application.
bash
```
poetry export -o dist/requirements.txt
pip install -r dist/requirements.txt -t dist/deps
cd dist/deps
zip -r ../deps.zip *
cd -
```
INFO
This ZIP file will not contain apache-flink, which is defined as a dev dependency in pyproject.toml. The PyFlink library will be provided by the Flink Docker image at runtime.
In the MinIO console, upload the following files to the kafka-python bucket. Do not include the directory name (dist/) in the object key.
- dist/kafka_python-0.1.0-py3-none-any.whl
- dist/deps.zip

Submitting the Application

Run the following command to submit the Flink application to the DEFAULT workspace in your local LakeSail server. The command is similar to the one in Kafka Data Processing in Python, but the differences are highlighted.

bash

curl -i -X POST \
  "http://localhost:18080/api/flink/v1/workspaces/DEFAULT/applications" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer ${LAKESAIL_API_KEY}" \
  -d @- <<EOF
{
  "name": "kafka-python",
  "runtime": "FLINK_1_17",
  "jars": ["s3://kafka-python/flink-sql-connector-kafka-1.17.1.jar"],
  "plugins": [
    {
      "name": "flink-s3-hadoop",
      "jars": ["/opt/flink/opt/flink-s3-fs-hadoop-1.17.1.jar"]
    }
  ],
  "flinkConfiguration": {
    "fs.s3a.endpoint": "http://minio:19000",
    "fs.s3a.path.style.access": "true",
    "fs.s3a.access.key": "minioadmin",
    "fs.s3a.secret.key": "miniopwd",
    "python.files": "s3://kafka-python/kafka_python-0.1.0-py3-none-any.whl,s3://kafka-python/deps.zip",
    "taskmanager.numberOfTaskSlots": "2"
  },
  "logConfiguration": {},
  "job": {
    "entryClass": "org.apache.flink.client.python.PythonDriver",
    "args": ["-pym", "demo.basic"]
  },
  "jobManagerResource": {
    "cpu": 0.5,
    "memory": "1024m"
  },
  "taskManagerResource": {
    "cpu": 0.5,
    "memory": "1024m"
  }
}
EOF

You can now follow the same steps in Kafka Data Processing in Python to test the application.

What is LakeSail?

Tutorials

Working with Flink

LakeSail API

Python Application Packaging

Preparing the Project

Building and Deploying Artifacts

Submitting the Application

Further Reading

Python Application Packaging ​

Preparing the Project ​

Building and Deploying Artifacts ​

Submitting the Application ​

Further Reading ​

Python Application Packaging

Preparing the Project

Building and Deploying Artifacts

Submitting the Application

Further Reading