This project has moved and is read-only. For the latest updates, please go here.

Automating Data Differencing: Batch Processing

Visual Comparison
This panel displays a summary of all the queries executed so far, one row per comparison. It duplicates the information from the main window detailing the quality of the match, including the total percentage and the individual numbers for added, missing, and changed rows. The match percentages are color-coded here: green indicates a perfect match, yellow indicates 99% or better, and red is used for anything less.

Normally batch processing requires matched queries for the left and right data sources. You may override this using the Include Orphans option on the setup form, which then includes all files on both sides matching the file masks. Differencing details are not applicable for such files, so they are grayed out for orphans.

This screen shot also illustrates leveraging another option. The Flag non-zero option lets you alter the semantics of the result grid from flagging differences to flagging any results as invalid. That is, instead of running queries that return the “good” data and then compare the left and right data results, you may elect to run queries that return “bad” data. For example, identifying product IDs where the color field references a non-existent color, identifying duplicate records in a table, etc. With a set of queries designed to ferret out bad data, any returned results are invalid. When you enable this option, any non-zero results are flagged on the count columns in orange, as shown.

Last edited Apr 28, 2010 at 11:30 PM by msorens, version 3


No comments yet.