Handling Data Requests Collaboratively: A Deep Dive Review
Business users continuously require new data to make smarter decisions and take better actions. The traditional way of gathering BI business requirements no longer scales to meet the breadth and complexity of today’s business data. ETL and data engineers struggle to communicate with business stakeholders. Business teams can easily describe their desired data views, but they do not know (or care to know) the underlying source data that IT must transform. This break in communication makes requirement gathering a prolonged and error-prone process.
Luckily, there is a new way, a better way, of making new data happen. It involves rethinking of the business-IT relationship – viewing it as a trusted collaboration rather than as a customer-provider relationship. This post dives into this new approach, using as an example of a (seemingly) simple data request.
First, the old way of handling data requests
Let’s assume that the business team in some transaction processor would like to report on Gross Margin — essentially, net sales revenue minus its cost (bank fees, etc.) Let’s also assume that all required source data has been extracted and uploaded into the internal staging area of a data warehouse cluster as-is, in an ELT manner. How would the organization traditionally go about implementing such a report?
First, IT will develop a project glossary that contains metadata about the sales and cost tables. It will list the various tables and attributes, and their respective metadata.
Next, IT will gather requirements about how the business calculates Gross Margin. It will produce, for example, a hierarchical view of how revenue and cost are captured. Each level in the hierarchy transforms the data in the level beneath it, such as this:
Depending on the complexity of the business, some of these metrics can be fairly complex, requiring IT to produce additional artifacts, such as flowcharts that explain the process of calculating the metrics (for instance, if X is active then Y equals 500, otherwise it equals Z).
Next, IT will need to identify all of the special cases that are involved in the project. For instance, it will need to understand how the business team handles situations when a data element is missing, or when a data element contains an unrecognized value.
Once all of the documentation is produced, IT will attempt to implement the transformation logic in code. This process will not be perfect right off the bat, so multiple iterations will be required. Each time, the business team will need to clarify some logic, which IT will implement.
During the implementation phase, the business team does not realize any benefit from the project. Any transformed data element that is tested and ready to go is typically unavailable to the team, as long as the project isn’t complete, and the transformation code isn’t deployed in production.
A better way to handle data requests
Instead of serializing new data creation (get requirements, implement the model, optimize the transformations), Lore IO customers progressively evolve their data fabric by bringing their business and IT teams together to collaborate on data requests. Here’s how.
- Business teams start the process by creating formal data requests. Beginning at the highest level of the hierarchy, they annotate their desired views and use declarative language to describe their data attributes.
- The teams then assign micro-tasks to each other to further define and declare the hierarchical data elements that ultimately make up the needed data view. They improve the transformation logic incrementally and validate their model instantly.
- Lore IO scans the data and generates a schema. It also generates an automated data catalog that IT uses to map the staged data in the data warehouse to the lowest nodes of the hierarchy. Whenever possible, Lore IO offers automated recommendations on how to map the data.
- When the business team runs their new queries, Lore IO converts the data definitions into code that runs on the actual data. This enables the teams to test their model virtually without the need of moving the data. The teams can diagnose root causes with data and logic lineage whenever problems are encountered.
- Finally, the service is turned on, with Lore IO monitoring and alerting for any data issues.
Returning to our previous example of calculating gross margin, here’s how this is accomplished faster and more efficiently in Lore IO.
First, the business team uses Lore IO to create a new data mart for the gross margin. They define a new table and add their desired columns. For each column, the team adds a description so that everyone is clear on the column data.
Next, the business team defines each column by using either an expression, a column map, a column lookup, etc. For instance, for the column Gross Margin, the team my define an expression of ‘@transaction.revenue’ – ‘@transaction.cost’. The @transaction notes a table where the column should be found.
The columns need not be defined before they are referenced; one team member who defines the Gross Margin can assign a micro-task for another team member to define the underlying columns. The status of each column can be tracked and updated as its definition is created. And the transformation rule can be tested and previewed.
Modeling rules of lower levels in hierarchy can be done in parallel. They can also be edited independently at any time by anyone who understands the transformation logic. Lore IO will then convert all of the declarative modeling into the right SQL that will run on the data.
To aid the teams in defining lower level data attributes, Lore IO offer a library of transformation functions that speed up the definition process. For instance, before any transaction revenue can be used, the team must first check if there was partial or complete refund some days later. Using a time-series function, the team can associate any chargebacks to the original revenue.
Another prebuilt function is a funnel, where the team can define the series of steps that constitute a successful or a failed transaction. Once the lowest nodes of the hierarchy are declared, IT can map the staged data to them. Lore IO will generate a data catalog and will automatically offer mapping recommendations that IT can accept or override. The business team can then run their query and view the actual data now transformed. Gross Margin can be calculated over time, and any errors can be addressed again.
Cross-team collaboration that progressively works on independent node in the transformation hierarchy has been proven to accelerate time to value on the raw data and team productivity. This radically different approach is quickly gaining market awareness. We hope you’d give it a look as well soon.