Карта сайта
Версия для печати

На пути к созданию единых стандартов по измерению качества данных: Выгоды от применения на практике (Часть 6)

30 августа 2013 В предыдущих 5 статьях мы рассмотрели 6 критериев оценки качества данных. Материал, который мы Вам предлагаем на этой неделе, подводит итоги по теме создания единых стандартов по измерению качества данных. Читаем и делимся своими мыслями на страничках Citia BTC в соц.сетях!
Clearly there are many valuable aspects to the dimensions of data quality:
  • The categorization of data by quality properties allows prospective consumers to evaluate whether the data meets their needs in terms of its current properties (completeness, precision, etc.).
  • The categorization of data by quality properties provides a mechanism to prioritize data quality cleanup, process changes and implement data stewardship/governance.
  • Dimensions (and, more specifically, the underlying concepts with the associated metrics) provide a method of measuring quality over time.
  • The categorization of data by quality properties allows practitioners to predict business impact based on known behavior of each dimension of quality (e.g., lack of completeness yields understated financials, invalid values can lead to miscategorization or aggregation).
The purpose for having an industry-accepted set of dimensions with associated concepts is to allow organizations to effectively communicate internally and externally. In a more networked society, where there are more external demands on our data, such as governmental regulation, legal, security, corporate partnerships and corporate valuation, agreed-upon standards are a must.

In a recent discussion on this topic with data quality author Danette McGilvray, she pointed out that from an internal perspective, the quicker an organization can establish and start using these foundational dimensions, the sooner they will see the benefits. Why not get a jump-start using the industry standard and then add custom categories and concepts as needed?

Bringing it All Together

In this capstone article, I’ve compiled the proposed list of dimensions Figure 1 lists the dimensions identified by the data quality authors and associated concepts before standardization. Note the red arrows crossing the vertical dashed lines indicate where authors cited concepts within other dimensions. Using this charting method, the optimal relationship would have dimensions with underlying concepts only within each individual column — no red dashed arrows.

Figure 1

Figure 1 lists concepts, independent of author. Table 1 provides a side-by-side comparison of the dimensions between authors, as covered in articles one through five of this series. 

Table 1

Someone will likely disagree with the way these have been conformed, but as everyone who participates in data governance knows, there has to be some compromise in order to create a standard. I think the following is palatable to most of the authors cited and true to the underlying reasons for each concept.

It should be noted, though, that this work has not taken into account the direct impact of unstructured data quality (e.g., textual documents, video, audio, etc.), and over time we’d expect that the number of concepts documented under these dimensions would grow and other dimensions will likely be introduced. The industry standard will likely be a living cannon of the agreed-upon dimensions.

The consolidated list of dimensions of data quality and underlying concepts, based on the consolidation in articles one through five, are listed in Table 2.

Table 2

It should be noted that this is not a list of definitions of the dimensions, which would require an extensive review, negotiation and compromise effort among industry thought leadership. Rather, this is a conformed list of the underlying concepts for each dimension. (I am presenting this topic at the International Association for Information and Data Quality Conferences called IDQ 2013 in Little Rock, AR this November. I hope to see you there and discuss this topic further.)

In conclusion, I stress that although many of the dimensions put forth by data quality authors are good mechanisms to ensure quality information management work products, they aren’t specific to the quality of data and its intended use.

This is where we should go back to the standard definition for data quality: “Fitness for Use,” which is a misnomer. It should be “Fitness for intended use.” After all, we wouldn't say that a Ferrari is of poor quality when used off-roading, would we? Rather it is of exceptional quality for its purpose (aesthetic beauty, acceleration, high-speed maneuvering on flat surfaces, etc.). In terms of creating standards, the presumption has to be that the data is for a given purpose/audience, and then within that scope we can define whether it meets our needs or not.