← Back to guides

Python scripts for CSV cleanup

Build script-driven, repeatable transformations and keep the browser cleanup step as validation.

1) Treat script cleanups as pipelines

A clean pipeline should have an explicit order. The most stable order is parse, normalize schema, type-check, then write. Keep this explicit in code and documentation.

2) Use idempotent transforms

If a script run twice changes output each time, debugging becomes expensive. Idempotent operations make reruns safe and reduce confusion between script and UI steps.

3) Validate before and after

Script output should be checked with measurable rules: row counts, null density, duplicate rate, and schema drift. Use spot checks in a browser tool for contextual validation (header names, sample rows, edge cases).

For recurring imports, include an explicit step for how to remove null rows from csv in the scripted workflow.

4) Blend with local-first QA

Use Python for large transformations and bulk workloads, then pass a sample through local cleanup UI before publication. It is faster and catches presentation issues (like delimiters and quoting) that automated scripts may overlook.