The labels we give data mean nothing if they do not translate across our enterprises. Imagine, for instance, if a Chief Marketing Officer had a different definition of what a qualified lead is than the sales leadership at a company. This could undermine the significance of the metrics in a marketing performance report, and sow discord between the sales and marketing organizations. While the importance of a maintaining a business glossary, to document the agreed upon definitions of key business terms, may seem self-evident, the truth is that many businesses operate without one. In fact, a recent study found that 42% of managers acknowledge making erroneous business decisions due to misunderstanding of terminology and data usage. By establishing a business glossary, not only are you fostering alignment around how the various business units in an organization express their data, you are also laying the cornerstone to an effective data governance program. The next step is to apply the definitions in your business glossary to the datasets across your enterprise through data profiling.
“What’s in your data?” While this may sound like a lofty, almost existential question, in truth, it’s a practical inquiry that can be answered through data profiling. Data profiling tools enable businesses to assess the accuracy, integrity, quality, and consistency of the values within datasets. When it comes to developing a data governance program, these applications are critical for uncovering the values in your data that do not meet your prescribed rules and standards, and where possible, replacing them with corrected values through transform script. Research conducted by Veritas Technologies, found that 66% of “high” performing data governance programs were actively employing data profiling techniques. Conversely, of the “low” performing data governance programs surveyed, only 23% were employing such measures. While data profiling allows your business to root out rot and apply standardization to values across datasets to match your business glossary, it does not account for the sourcing of your data. This is where data lineage comes into play.
When we buy groceries, typically, first thing we check is the ingredients of our food. Once we determine that the ingredients are high quality, we check the food’s origin, to make sure it was ethically, and reputably sourced. Like with food, origin, or “provenance” provides data with context.
If data quality issues are surfaced during profiling, or indeed, during analytics, it’s imperative to be able to identify the applications or processes that created the bad data. Once the sources of bad data are identified, they can be either remediated, or eliminated to prevent further data quality erosion. By employing even basic metadata management practices, such as requiring time stamps for each data entry, a business can begin to move towards achieving data provenance.
Like patents, infrastructure, and intellectual property, data stores are a major business asset. Because of this, today, when conducting business valuations, "data transparency is just as important as financial transparency," and data provenance measures will be a key to establishing this transparency.