CSV date and number cleaning
Many broken analytics pipelines start with one small formatting mismatch. This guide focuses on the highest-value corrections for dates and numbers in CSV and spreadsheet-driven exports.
Unify date format policy
Pick one canonical date standard and apply it everywhere. The safest default for teams and automation is ISO date formatting (YYYY-MM-DD).
- Convert all date columns before numeric or textual joins.
- Reject ambiguous date forms such as 12/03/2026 until resolved.
- Normalize timezone-aware timestamps into one strategy.
Handle locale number formatting
Thousand separators, decimal separators, and currency symbols change by locale. Inconsistent number parsing is a common source of metric drift.
- Define decimal separator expectation per source.
- Remove currency symbols before numeric conversion.
- Normalize negative values and sign placement.
Excel date serials and large integer precision
Excel often stores date values as serial numbers. Large identifiers and IDs can also be rewritten in scientific notation by downstream tools.
- Convert date serial values using explicit origin logic.
- Store identifiers as text when precision matters.
- Format large integers with enough width before export.
Validation checks before publish
After transformation, validate for impossible dates, invalid numbers, and type mismatches. This catches hidden corruption before downstream systems fail.
- Check for date ranges outside business bounds.
- Detect mixed decimal separators in the same column.
- Block files with unresolved numeric tokens.