Data Cleansing Strategy: Manual vs. Automation

Posted by Kathy Jameson on Tue, Aug 14, 2012

Data cleansing is the process of detecting, diagnosing, and editing faulty data. It deals with data problems once they have occurred. Error prevention strategies can reduce many problems, but cannot eliminate them.

This does not mean we can forego strategy altogether though. Without a data cleansing strategy the data warehouse will suffer from the following:

  • lack of quality
  • loss of trust
  • decrease in business sponsorship and funding

Since data cleansing is tedious and time consuming, a sound methodical strategy is pivotal. A rule-based strategy for data cleansing begins with the understanding that there are really only two options for data cleansing – clean the source data or clean the warehouse data.

When it comes to the latter, the first thought among many organizations is to utilize a DIY approach involving manual data cleansing, which occurs when erroneous data cannot be fixed programmatically. Data volumes to be cleansed are small making the automation process a poor investment.

For the majority of companies, a better suited strategy is automated data cleansing, which handles the cleaning of both warehouse data and source data. As compared to manual cleansing, an automated process can be done on the front- and back-end. Depending on your data, you probably will want to cleanse data as it is collected, as well as later during periodic intervals.  An automated process can easily enhance a database by doing timely scheduled cleanses. This is very useful since data quality naturally erodes over time.

Automation should be part of your data cleansing strategy if you have a large-scale database. The cost involved in manual cleansing is high when compared to the time in which it can be done with an automated process in place. All or majority of the data errors can be fixed programmatically by applying a cloud-based solution like StrikeIron’s that use logical rules to cleanse data in real-time.

