Gathering Data Requirements the Right Way
As a data engineer, you work hard to integrate high volumes of data. Your internal business customers use your transformed data to predict, influence and respond to end user behavior. That is the essence of the digital transformation that your work supports and facilitates.
That’s no easy task though.
Think about the time and effort you’ve been investing to gather BI business requirements. Your efforts may have involved numerous tools, such as Excel, email, requirement documents, meetings, conference calls and whiteboard sessions, just to name a few. A lot of effort, a lot of tools, a lot of chaos. Why is that?
Gathering business requirements in general is difficult
Relax, it’s not you. Getting business requirements right for any project is an art all to itself. Studies have shown that over 95% of participants mentioned inadequate or poor requirements definition and management as a factor in their failed projects. It’s common to find requirements incomplete, repeatedly changing, reprioritized or conflicting with each other.
95% of participants mentioned inadequate or poor requirements definition and management as a factor of their failed projects.
Business users and the analysts who support them do not mean to be difficult. It’s the environment in which they operate that is difficult. Business needs continuously change along with evolving market conditions and competitive pressures. Internal stakeholders come and go. The changing landscape isn’t conducive to solidifying requirement documents quickly and elegantly.
Language barriers in data requirements
And then there is the issue of language that often rears its head in data projects. There seems to be a stubborn disconnect between how data producers and stewards think and talk about data, and how business users and their analysts think and talk about data. Much of the cost of gathering BI business requirements lies in translating (and in confirming translations) back and forth between the two camps.
Why is this happening?
Language barriers exist because the two camps live in two different realms of data. Data engineers live in the physical realm of data. They know the systems that produce and store the data. They know how best to access data elements, blend and transform them and make them publicly available.
Business users, on the other hand, live in the metaphysical realm of data. They don’t care about or know the data systems. They want to use data to express complex business events that transcend the underlying data; events that are not perceptible simply by examining the system data.
In the physical realm, data is the output of the systems that produce and store it. In the metaphysical realm, data is the raw material that must be shaped using arbitrary business context and semantics.
Viewing matters this way we can now understand why gathering data requirements is so arduous: much of the effort essentially entails pulling inhabitants of one realm into the other in the hope that they could be made to see data “in the right way.” Such an ethnocentric approach to data management is rarely helpful.
So what’s the solution?
If the inhabitants of two realms are to be kept to their own domain then how do we bridge them? The answer lies in a new type of collaboration where team members contribute what they understand and know best to progressively build a reportable model. The idea is to work in an environment that can convert data requirements into working query code that runs on top of the source data.
The process at a high level is as follows:
- Business users and their analysts list their target data elements (for instance, “I want to report on our product sales”).
- For each target data element, business users provide information that define its business semantics (for instance, “a product sale happens when a registered customer purchases one or more items of a given SKU in a given fiscal quarter, at a given price with a given currency, using a given discount, which does not result in a refund”).
- Team members convert these semantic insights into a to-do list. Each task on the list calls out a transformational logic that the team must define (for instance, “Multiple the number of items by the SKU price and then apply the discount amount”).
- Once the logic is defined, data engineers can load their source data into the system and map it into to the transformation logic. At this point, ideally, the system can convert the logic into query code, run it, and publish the resulting data views.
This new approach for data collaboration offers unique advantages. First, business users can proactively start a new data request on their own, rather than wait on their IT counterparts to initiate requirement gathering. Business users get to keep their metaphysical abstractions and describe what they need using their own words with no regard to the underlying data.
Second, IT can easily map the data elements they understand into the logical model (or create new tasks for business users to further clarify ambiguous requirements). As they carry out source-to-target mapping, IT enables the business to create a trusted semantic layer that business users can build on (for instance, use transformed data as a building block of a new, further transformed element).
Third, the business as a whole can respond faster to changing data needs. Users can easily adapt the transformation logic at any time, and the system will automatically produce the new query code. There is no need to maintain query code manually or worry about possible performance degradation when changes occur.
This new collaboration model essentially takes away the blame game. Instead of pinning blame on the other side for failing to understand us, we become active and equal stakeholders in the process. We get to contribute using the language we know, while helping our counterparts to map their knowledge to ours. We meet each other halfway, knowing that we all strive towards the same goal of better and more accurate reporting.