These days, many data aggregators face a difficult challenge: how to ingest and organize all the data they routinely receive from partners and customers and do so quickly and cost effectively? Retail brands, travel and hospitality aggregators, and similar other data consumers receive data from many external stakeholders on a daily basis.
Each of their providers organizes its own data differently, according to a custom business logic. This makes it difficult for the data aggregators to standardize their incoming data for proper aggregation and analysis.
To address this challenge, data aggregators have relied heavily on hard-coded ETL operations and data pipelines to standardize the heterogeneous datasets they receive. Others have turned to spreadsheets and manual workflows to make sense of the data. Either approach applies standardization logic directly where the data lives: the former does so after the data is on-boarded; the latter just before it’s analyzed.
The problem with hard-coding mapping logic
These approaches may have worked in the past, but with the explosion of business data today they are no longer viable. Hard-coding mapping rules introduce several problems:
- Increased cost of doing business: Hard-coded data mapping creates friction just as the business attempts to advance its mission. A retail brand, for example, is in the business of selling more goods through more retailers. Hard-coded data standardization prolongs the onboarding process of new retailers, increasing time to value.
- Maintenance overhead: The more standardization code data aggregators have, the more expensive and time-consuming it becomes to maintain it.
- Increased risk of errors: As standardization code grows, it becomes more difficult to test the data and catch errors early before they reach business users.
- Data availability limitations: Often, data providers offer only datasets that are easy and cheap for them to produce. Aggregators who require full business visibility are constrained in their analyses with limited data.
- Lack of reusability: hard-coded mapping logic is not easily reused in subsequent pipelines or ETL processes. Rather, it must be essentially re-implemented, which is costly and error-prone.
Virtualizing data mapping
Luckily, data standardization doesn’t have to be painful. A modern strategy for handling source-to-target mapping from many sources is to virtualize the entire process. This way, software algorithms that execute data transformations and mappings are decoupled from the systems that house the data. Furthermore, these algorithms are not persisted in code; rather, their logic is maintained in human-readable SQL-like rules that business analysts maintain on their own using visual interfaces.
Virtual data standardization abstracts away the complex semantics of how the data is captured, transformed, and cobbled together. With it, data aggregators:
1. On-board new partners and customers quickly
2. Enhance the rules that logically blend the new provider’s schema with existing schemas
3. Provide the business with faster and more accurate analytics
Onboarding partner and customer data at scale
Lore IO enables companies to prepare data across disparate sources without the need for engineering to build ETL and data pipelines. Customers unlock the full value of their data by empowering business users to work with datasets that are hard to understand, reconcile, and blend.
Lore IO is conceptually a virtual data mart factory. Its proprietary algorithms enable customers to instantly capture and validate business logic in support of a wide range of use cases -- one of which is data transformation.
When a new data provider is on-boarded, the Lore IO platform uses its proprietary Data Scanner to understand the source data, regardless of the format or the system it’s in. The platform builds a virtual data layer that is automatically enhanced with pointers to the new source data and includes all the transformation logic that the business requires.
These virtual data columns and their transformations allow the Lore IO platform to query the raw data at any time, eliminating data moves and copies, and ensuring that query results reflect the latest changes in the raw data. Lore IO monitors the raw data. When schema changes are detected, the platform makes the necessary adjustments in the data layer to point to the raw data elements correctly.
With the virtual data columns added, business users define virtual rules to standardize and blend the data. The rules are virtual since they’re not persisted in code. They are kept in human-readable form that business users maintain. It’s only at query time that Lore IO automatically creates the necessary code that it executes to create tables and views.
There are three types of rules that business users maintain in Lore IO for data transformation:
- Taxonomy rules: These rules map the columns and values of the data provider with those of the aggregator. For instance, a partner can describe their transactions as having two columns: a settlement amount and a type, where the type can be one of three options.
- Reshape rules: These rules specify how to pull data elements together from the partner’s side, and how to distribute them on the aggregator’s side. For example, a retailer might provide all transaction data in a single file, but the aggregator needs to split it into three tables, one for transactions, another for retailer data, and yet another for consumers.
- Semantic rules: These rules articulate the meanings of data elements and how the business uses them to describe its domain. For example, what constitutes a successful transaction? And how should its final settled amount be computed after accounting for refunds? Each data provider has its own semantics that makes sense in the context of its operations, but one that the data aggregator must reconcile with all other providers’ data definitions.
Lore IO customers define these rules declaratively using a visual tool. It has a rich set of transformation functions that make standardization easy. For instance, customers can map columns and translate values to a standard set.
Customers can pull data together from multiple files including XML, CSV, JSON, EDI etc. Common problems such as a different order of columns, renamed columns, changes to the values or types of columns can be handled automatically. Customers can also use a SQL console to describe more complex logic.
In addition, customers can build data validations and reports to monitor and check that all the standardizations happened correctly.
Once the standardization rules have been defined, Lore IO waits for new data from each source to show up. As soon as a new file or record is added or changed, the Lore IO Data Scanner detects it, applies the relevant standardization rules -- by dynamically generating relevant SQL code and executing it -- and exports to the data to a standard format. Lore IO also executes validations on the view to ensure data quality and reliability.
Depending on the mapping rules used, the SQL that is generated automatically can be formidable. Lore IO will often dynamically create a complex multi-step transformation pipeline every time new data is seen. Customers have full visibility into the entire data lineage, enabling them to ensure data quality and to make quick changes when problems are encountered.
When new data is detected, if Lore IO Data Scanner finds any changes to the underlying schema such as a different order of columns, changed column names or value sets, etc. Lore IO can attempt to automatically generate rules to handle any exceptions. Alerts are generated whenever an exception is found with details on whether it was handled by the system or manual intervention is needed.
With data views executing, Lore IO customers export valid, standardized datasets to their existing BI tools or other destinations of choice. All Lore IO activities are logged and queryable. In addition, customers can define specific alerts for their data.
To data standardization and beyond
Standardizing business data from multiple partners is a critical and common task that is only to become more important and frequent as economic developments offer the opportunity to partner with more stakeholders, and as these data providers continue to shape their datasets according to their own business logic.
Given the impact that data standardization has on business agility and performance, brands that aggregate data from multiple sources should consider carefully the infrastructure and workflows they put in place, and their ability to onboard new partners.
For many years, brands have been hard-coding their standardization logic in code that resides in the systems that housed and moved data around. Such strong coupling meant that brands had to spend significant time creating, maintaining, and debugging standardization code that was spread around several locations, with limited ability to ensure its quality and reusability. With complex standardization logic, brands have struggled to onboard new partners quickly, causing them to miss onboarding milestones and new revenue opportunities.
Lore IO offers a unique approach to data transformation through virtualization. Its Data Management Platform decouples and abstracts away standardization code, enabling business users to define standardization rules using a visual interface that converts the logic to code at query time. With this type of virtualization, brands increase their business agility, and onboard new partners faster.