Stephen Gilmore

🐍 CSV Report Comparions with Python

Python March 15th, 2023 2 minute read.

Report Compare is a Python tool that will generate difference reports bewteen two CSV files. It's particularly useful for before and after testing anytime data changes. This project was created because the other Python comparison tools I found were not able to handle instances where the columns didn't perfectly match or there were records on the files with duplicate "unique" identifiers.


Quickstart

  1. Specify paths to before, after, and compare results files
  2. Specify the key/id field that uniquely identifies records between the two files
  3. Run and review the differences
from pathlib import Path

from compare_reports import ReportCompare

# Specify file paths
file_1 = Path.cwd() / "before_csv.csv"
file_2 = Path.cwd() / "after_csv.csv"
results_file = Path.cwd() / "COMPARE_RESULTS.xlsx"

# Create ReportCompare Object
compare = ReportCompare(
    file1=file_1,
    file1_name=file_1.name,
    file2=file_2,
    file2_name=file_2.name,
    key="eid",
)

# Execute Report Compare Steps
compare.compare_files()
compare.build_summary()

# Export Summary of Report Compare Results
compare.write_to_excel(results_file=results_file)

How does it work?

  1. Script ensures it can read both of the input files.
  2. Script "normalizes" the data
  3. Identification, removal, and reporting of column differences between the files.
  4. Identification, removal, and reporting of duplicate records on the files.
  5. Identification, removal, and reporting of missing records between the files.
  6. Script completes a field by field, record by record comparison
  7. Script creates an XLSX report of the results.
  8. Summary tab of the findings for each step
  9. One tab for each type of mismatch found.