How do I run a Python transformation?
Create, run, and develop a Python transformation in Keboola — write the script, map input and output CSV files, run it, confirm the result, and debug it in a workspace or locally.
You want to process data with Python where SQL is awkward. A Python transformation reads your mapped input tables as CSV files, runs your script, and writes CSV outputs back to Storage. This page gets you from nothing to a successful run, then shows how to develop and debug. For limits, file paths, and packages, see the reference.
Time: ~10 minutes · You will need: a Keboola project and one table in Storage (or the sample CSV file).
Step 1 — Create the transformation
Section titled “Step 1 — Create the transformation”- Open Components → Transformations, click New Transformation, and choose Python Transformation.
- Name it and confirm.
Step 2 — Map input and output
Section titled “Step 2 — Map input and output”- Upload the sample CSV file to Storage as a table.
- In Input Mapping, add it and set its Destination to
source(the script readsin/tables/source.csv). - In Output Mapping, map
result.csv(produced by the script) to a new Storage table, for exampleout.c-main.result.
Step 3 — Write the script
Section titled “Step 3 — Write the script”Paste a script that reads in/tables/source.csv and writes out/tables/result.csv:
import csv
with open('in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('out/tables/result.csv', mode='wt', encoding='utf-8') as out_file: reader = csv.DictReader((line.replace('\0', '') for line in in_file), dialect='kbc') writer = csv.DictWriter(out_file, dialect='kbc', fieldnames=['col1', 'col2']) writer.writeheader() for row in reader: writer.writerow({'col1': row['first'] + 'ping', 'col2': int(row['second']) * 42})See the reference for list-based and explicit-format variants. You can split the script into blocks.
Step 4 — Run it and confirm the result
Section titled “Step 4 — Run it and confirm the result”- Click Run.
- Wait for the job to finish with a success status.
- Open Storage, find your output table, and confirm
col1has thepingsuffix andcol2issecond × 42.
Develop and debug
Section titled “Develop and debug”The fastest way to iterate is a Python workspace (JupyterLab) with the same input mapping:
- Configure input (and optionally output) mapping, then Load Data and Connect to the workspace.
- Paste your script into the notebook — the
in//out/directory structure and input files are already prepared. - Run it; optionally Unload Data to push results to Storage, or Create Transformation to scaffold a transformation with the same mapping.
To develop locally, install Python and recreate the directory structure (in/tables/, out/tables/) with your input files. A ready example is in data.zip; the same script then runs unchanged as a transformation. For an exact environment, use the Keboola Docker image.
Make it faster (backend size)
Section titled “Make it faster (backend size)”For large data, raise the Backend size in the configuration (XSmall → Small → Medium → Large); see backend sizes. This affects time-credit consumption.
Troubleshooting
Section titled “Troubleshooting”| Symptom | Likely cause | Fix |
|---|---|---|
FileNotFoundError on in/tables/source.csv | Input mapping destination doesn’t match the path in the script | Set the input Destination to source (or change the path in the script). |
| Output table empty / not created | Output mapping Source doesn’t match the file the script writes | Map result.csv (the file your script writes to out/tables/). |
IndentationError / TabError | Mixed tabs and spaces | Use consistent indentation; Python is indentation-sensitive. |
A defined main() never runs | Wrapped in if __name__ == '__main__': | Call main() directly instead. |
Related
Section titled “Related”- Python transformation reference — limits, file paths, packages, CSV.
- Custom Python application — for code that needs encrypted secrets.
- Workspaces · Input and output mapping.