Spark DataFrame API
You can use the Spark DataFrame API to work with structured data. Here is an example of reading a Parquet dataset and querying it in PySpark.
python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
df = spark.read.parquet("/data/users.parquet")
df.filter(col("age") > 30).select("name", "email").show()