Transformations

How do I run a Python transformation?

Create, run, and develop a Python transformation in Keboola — write the script, map input and output CSV files, run it, confirm the result, and debug it in a workspace or locally.

You want to process data with Python where SQL is awkward. A Python transformation reads your mapped input tables as CSV files, runs your script, and writes CSV outputs back to Storage. This page gets you from nothing to a successful run, then shows how to develop and debug. For limits, file paths, and packages, see the reference.

Time: ~10 minutes · You will need: a Keboola project and one table in Storage (or the sample CSV file).

Step 1 — Create the transformation

Open Components → Transformations, click New Transformation, and choose Python Transformation.
Name it and confirm.

Step 2 — Map input and output

Upload the sample CSV file to Storage as a table.
In Input Mapping, add it and set its Destination to source (the script reads in/tables/source.csv).
In Output Mapping, map result.csv (produced by the script) to a new Storage table, for example out.c-main.result.

Step 3 — Write the script

Paste a script that reads in/tables/source.csv and writes out/tables/result.csv:

import csv

with open('in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('out/tables/result.csv', mode='wt', encoding='utf-8') as out_file:
    reader = csv.DictReader((line.replace('\0', '') for line in in_file), dialect='kbc')
    writer = csv.DictWriter(out_file, dialect='kbc', fieldnames=['col1', 'col2'])
    writer.writeheader()
    for row in reader:
        writer.writerow({'col1': row['first'] + 'ping', 'col2': int(row['second']) * 42})

See the reference for list-based and explicit-format variants. You can split the script into blocks.

Step 4 — Run it and confirm the result

Click Run.
Wait for the job to finish with a success status.
Open Storage, find your output table, and confirm col1 has the ping suffix and col2 is second × 42.

Develop and debug

The fastest way to iterate is a Python workspace (JupyterLab) with the same input mapping:

Configure input (and optionally output) mapping, then Load Data and Connect to the workspace.
Paste your script into the notebook — the in//out/ directory structure and input files are already prepared.
Run it; optionally Unload Data to push results to Storage, or Create Transformation to scaffold a transformation with the same mapping.

To develop locally, install Python and recreate the directory structure (in/tables/, out/tables/) with your input files. A ready example is in data.zip; the same script then runs unchanged as a transformation. For an exact environment, use the Keboola Docker image.

Make it faster (backend size)

For large data, raise the Backend size in the configuration (XSmall → Small → Medium → Large); see backend sizes. This affects time-credit consumption.

Troubleshooting

Symptom	Likely cause	Fix
`FileNotFoundError` on `in/tables/source.csv`	Input mapping destination doesn’t match the path in the script	Set the input Destination to `source` (or change the path in the script).
Output table empty / not created	Output mapping Source doesn’t match the file the script writes	Map `result.csv` (the file your script writes to `out/tables/`).
`IndentationError` / `TabError`	Mixed tabs and spaces	Use consistent indentation; Python is indentation-sensitive.
A defined `main()` never runs	Wrapped in `if __name__ == '__main__':`	Call `main()` directly instead.

Python transformation reference — limits, file paths, packages, CSV.
Custom Python application — for code that needs encrypted secrets.
Workspaces · Input and output mapping.