Data Quality Best Practices - At the Point of Data Collection
There are many different kinds of batch data cleansing processes that can be performed against large databases of existing customer information. Standardizing inconsistent data, removing duplicate records, validating columns against up-to-date reference data, filling in missing data, and appending new data to existing data are all examples of customer data processing that can help improve the value of internal data assets.
When data assets undergo these kinds of processes their value increases and they enable business intelligence applications to be more useful, operations to be more efficient, and customer communication efforts to be more effective. These are worthwhile endeavors indeed.
However, it can often be a considerable effort to do large, after-the-fact database cleanup jobs - not to mention the considerable costs and complexity associated with offline data processing. Also, batch jobs are rarely a one-time effort, as the same problems begin to appear soon after a mass cleansing, and then begin to build to troublesome levels again, putting the data stewards of the organization back to square one.
An alternative can be to leverage real-time data quality mechanisms at the point of data collection
. This means validating data, filling in missing data, appending data, standardizing data, and comparing it to existing data for duplicates in real-time, before
it ever gets into the database. This can eliminate or dramatically reduce the cost and effort associated with downstream batch cleanup processes, enabling the benefits of clean, complete, accurate data to appear immediately across the organization. It also prevents the build up of these kinds of data quality issues over time.
Real-time data quality can be achieved by integrating calls to data quality functions
within business processes, Website data collection forms, customer-facing applications, call center applications where representatives speak with customers, and anywhere else that data is collected in real-time. Typically these programmatic calls are to Cloud-based APIs that are leveraging constantly refreshed reference data to ensure the highest possible data accuracy.
Here more than ever, an ounce of prevention is worth a pound of cure.