The data wrangling of yore doesn’t cut it anymore. Gone are the days when you could set-and-forget ETL jobs and their data munging code. Back in the olden days, source data was small, batched, and predictable. Data pipelines were established and (almost) set in stone. ETL developers often built upon them but rarely re-engineered them lest they’d bring down the house.
True, data validations did on occasion fail and pesky business analysts were known to vacillate on which source fields to pull — forcing ETL developers to edit the munging code — but for the most part, data wrangling was its own isolated thing. It was a discrete pre-step before any analysis took place. It had clear boundaries and cranky owners. One knew up front what transformations one needed to affect, and one affected them, or had an ETL engineer do it. Things were simple, and coffee was simply coffee.
This is all ancient history now.
Today, data preparation (with wrangling at its core) is enmeshed in BI and data science, and it’s often challenging to tell one from the other. Data scientists often lament how data prep accounts for about 80% of their work. With the increase in data volume, variety, and velocity, and the desire to squeeze out more dollars out of data, BI pros will likely spend more of their time too readying their data.
Collaboration fuels the new data wrangling
If conventional data prep doesn’t cut it anymore then where must one turn to? — That’s easy: to a collaborative data transformation platform like Lore IO.
Platforms like Lore IO transform the waterfall data prep model into a virtuous cycle. Data sourcing, wrangling and analyzing are broken into a set of micro tasks that business and IT teams execute as a unified task force. Metadata creation tasks are executed in parallel, never blocking each other. As soon as a business requirement is fleshed out — even for one field — team members declare its required metadata, test it on the source data, and enhance. Lore IO automates the data pipeline and ETL code creation on the fly, so wrangling feedback is almost instantaneous. Don’t like what you see in a report? — Go ahead and reshape the field right there and then.
Collaborative data transformation platforms democratize data wrangling across the org, allocating tasks based on stakeholders’ source system and business logic familiarity. One team member gets to set up rules for handling missing values, and another gets to strip off metadata from a measurement field. One defines aggregation rules, and the other maps source columns into the target schema. Long live collaboration!
This new age of data prep is made possible by the fact that platforms like Lore IO use declarative modeling to support no-code data wrangling. No longer are coders required to specify the procedures of handling messy source data. Now anyone in the org who understands the data can use a point-and-click interface to declare their wrangling intentions. One need not tell the system what to do; one simply declares one’s desires, and the platform will convert the declarations into actual code.
As transformations are defined in metadata, they can be applied anywhere — including on the target tables and schemas — and in complete isolation from the actual data. Data engineers can mape the source data to the target tables at any time, regardless of how much of the wrangling was defined. This helps remove unnecessary roadblocks and accelerate the process.
Accelerate data modeling
While stakeholders manufacture metadata, they always should have ready access to the actual data. This is key to ensure that business logic is modeled faster and in accordance with the source systems. Platforms such as Lore IO make data wrangling easier with search and sampling capabilities. As stakeholders define transformation rules, they can query the system with a click of a button to sample or list out actual values. This helps define the right transformations from the start.
Mapping source to target tables and columns can be difficult. Lore IO studies mappings that stakeholders make, and offers recommendations for future column and value mappings. This helps stakeholders who are not fluent in the source or target schemas to get the mapping done faster.
All the transformation rules and validations are stored in a universal data layer that the stakeholders can always access. This means that team members can reuse them for subsequent transformations, instead of starting from scratch. With full data lineage and history, stakeholders can understand how the target tables and columns were constructed, gain more confidence with the data, and complete their modeling faster.