MCP Server
You can launch an MCP (Model Context Protocol) server using the Sail CLI. The MCP server provides tools for querying datasets via Spark SQL. By integrating the MCP server with LLM (large language model) agents, you can perform data analysis via natural language conversations.
Configuring the MCP Server
You can use the MCP server in Claude. To integrate MCP servers running locally with Claude, you need to download and install Claude for Desktop.
Launch Claude for Desktop and log in to your Claude account. Go to Settings... in the menu and click on the Developer tab in the dialog. Click on Edit Config which will show the configuration file in your file system.
Open the configuration file in a text editor and add the following content.
{
"mcpServers": {
"filesystem": {
"command": "/path/to/sail",
"args": ["spark", "mcp-server", "--transport", "stdio"]
}
}
}
Replace /path/to/sail
with the absolute path to the Sail CLI on your system. If you have installed Sail via pip
in a Python virtual environment, the path is $VENV/bin/sail
where $VENV
is the absolute path to the virtual environment. Please note that you must install the PySail library with MCP dependencies.
pip install "pysail[spark,mcp]"
INFO
You can run sail spark mcp-server --help
to see the available options for launching the MCP server.
Restart Claude for Desktop. A Sail MCP server is now managed by Claude for Desktop behind the scenes. You will see a hammer icon in the bottom right corner of the input box. Click on the hammer icon, and you will see a list of installed tools from the Sail MCP server.
Using the MCP Server
You can now ask Claude to analyze data for you! For now, we support Parquet datasets. Any of the sources listed in the Data Access guide can be used.
Point Claude to your dataset URI, and Claude will register the dataset as a temporary view in a Spark session. You can ask questions about your dataset, and Claude will write Spark SQL queries and request query execution via the Sail MCP server. Once the query result is returned, Claude will interpret the results for you.
WARNING
Please note that there is no guardrail right now to prevent Claude from generating SQL statements that will alter your data. We recommend that you approve each tool use explicitly and review the SQL statements generated by Claude. You should deny any query execution that are not SELECT
queries.
For remote data sources, we also recommend that you use credentials with read-only permissions on the data.
Cleaning Up
To remove the Sail MCP server from Claude for Desktop, remove the content from the configuration file and restart Claude for Desktop.