Before your data enters the ETL process, it's in your best interest to only work with "good" data – that is, data that conforms to its respective domain rules, ranges, allowed values, and perhaps other unusual restrictions. If it doesn't, you'll want to log and remove all erroneous records so as not to pollute your transformation, as well as to have means to report and fix the data later on.
Having data checked and verified in the transformation also goes hand in hand with CloverETL's mission of rapid data integration. With the right tools out of the box, along with an easy setup and immediate actionable results, it not only takes less trial and error, but also less time to achieve the expected outcomes.
Let's take a look at how this all works in Clover.
What is Validator?
Validator, a part of the CloverETL Data Quality package, is a comprehensive filtering tool that lets you visually define data quality rules. What does this mean? Simply put, Validator is a component where you specify a set of checks that filter incoming data. Anything that the filter doesn't let through is reported along with detailed information about the reasons why. You can use this output as a basis for correcting problems; what's great about this is even the non-technical team members in your organization can work with and understand the process. Imagine putting the output into a spreadsheet and sending it back to the accounting department to fix the problems – no "translation" needed!
Built-in validation rules can check various criteria like date format, numeric value, interval match, phone number validity and format, and more. If you have special requirements, you can implement custom validation rules in CTL (CloverETL Transformation Language), but more on that in another post.
How Validator Fits into Data Quality
Validator nicely complements the Data Profiler in the Data Quality package. Generally, you start with Data Profiler to assess the overall condition of your data using statistics. Seeing variations of formats, missing values, and excessive ranges indicates which fields will need special care when setting up Validator. This will make sure no bad records get through. Validator acts as the second stage of checks for specific problems in your data and reports each one of them to you. Used in conjunction, these tools allow for efficient, comprehensive data quality.
November 25, 2013