Master data are the key building blocks of business intelligence and reports. Gleaned out of transactional data, master data represent important entities of our organization, which our business users and executives routinely seek to analyze and understand.
We can think of the difference between master data and transactional data as the difference between nouns and verbs. Master data represent the people, things, or places (the nouns) that our organization cares about. Master data change infrequently; e.g. our favorite customer doesn’t move to a new address that often.
Transaction data, on the other hand, represent the verbs or the events — such as product bought, payment method used — that master data participate in. Unlike master data, transaction data are volatile, changing every time; e.g. the data and time of our customers’ purchases are different.
It’s all fun and games until new data sources emerge
As we noted, master data entities don’t change much over time, but they do have their share of inconsistencies that rear their ugly heads when we add new sources of information. Here are some examples:
- We can identify customers who purchase products on our ecommerce website, but not when they watch our hilarious infomercials on third-party websites. When we try to pull in data from these siloed sites, we’re unable to join them together and identify our most engaged users across channels.
- We use zip codes when shipping products to our US customers, but with our recent geographical expansion into the UK, we must use postcodes (and sometimes add a strange “Dependent Locality” field) when shipping overseas. Bringing in UK sales data may complicate our ability to create consistent data hierarchies and support reporting drilldowns across geographies.
- On our books, a product is a single configurable SKU, whereas our retail partners may sell a product as a bundle of multiple SKUs, again complicating standard reporting.
To help our teams overcome these challenges and carry out award-winning data reporting, we need to get our data modelers to design beautiful master tables with extra care and attention. They must consider the characteristics of our business, our analytical needs, and some solid data modeling principles too. We also need ETL engineers to build data pipelines and to code transformation jobs to standardize the different shapes and formats of our master data entities and their fields.
These two responsibilities—model the master tables and handle the transformations—are typically done in isolation, often by different team members. This siloed approach may work when the number of data sources starts small and stays small, but things can get hairy as we onboard new data sources. Often the model has to be rethought or transformation jobs redone.
Integrated data modeling and transformation
The way to ensure that master data stays reliable and trustworthy over time as new data sources are added is to accept the fact that data sources are idiosyncratic and that change is constant. We must, therefore, give up our set-and-forget-it mentality, and reengineer our data processes to support continuous modeling and transformation. The best approach is to fuse these two activities into a unified workflow that many team members — not just engineers — can execute it.
This is where Lore IO comes in.
Lore IO combines data modeling and data transformation in a way that benefits both. In fact, this capability is a key value driver for using Lore IO. By using the platform’s collaboration capabilities, business and technical team members work together to design master tables and columns, as well as declare how these tables and columns relate to transactional data. Let’s look at a few examples:
In Lore IO, global customers can model virtual columns in virtual master tables. They plug their transaction sources into Lore IO and then point the transformation rules to those sources. In the example below, we see a Zip/Post field that is based on column mapping. The rule tells Lore IO to pick a Zip Code field when transaction data come from US Orders, and the Postcode field when transaction data come from UK orders.
Similarly, customers can standardize reporting by mapping actual values. In the example below, we list all the possible values that we will accept into a Browser field:
While stakeholders model the tables and declare the required transformations, the Lore IO platform takes on the task of automating the entire data preparation process. In a click of a button, it converts the declaration to a data pipeline that runs on the source tables. Customers no longer need to handle code.
Another advantage of using Lore IO is that customers can build upon and expand their master data definitions as source data are added or changed. For instance, in the mapping columns example above, it’s easy to add another source and point to its own zip code field. Lore IO will handle the change automatically, as it builds the pipeline.
Lore IO customers search the data catalog, discovering relationships, metrics, cohorts, and funnels. Users can find their data faster by filtering on tables, concepts, owners, status, visibility, lineage, or data type. Customers can see all of their tables in a visual form via automatically created entity relationship diagrams that highlight all the relationships as they are created.
Finally, Lore IO offers full visibility into how the data is changed over time. In the image below we can see the progression of a field in the system: