In the days following the introduction of our COVID-19 Drugs and Vaccines Dashboard, we received questions regarding how we transformed the data to create the data visualization. In this blog post, I will briefly demonstrate how we used our tool to produce a 360° view with normalized data using a top-down approach. This transformation of data simplified the creation of the dashboard.
Since the beginning of the COVID-19 pandemic, many websites have been published to help people understand what the state of the pandemic is in different countries.
As a data transformation company, we decided we could also help make information easily accessible, and more specifically, the data related to research around medications and vaccines. Today, I am going to dive into how we used our technology to reshape clinical trials data with minimum coding, and made it easy to build a smart BI dashboard on top of it.
Designing the solution
We took a “top-down” approach to build this dashboard. The top-down approach begins with the big picture, then breaks down into smaller segments. So, for this case, we started from the result and went step-by-step to the raw data. This approach is closer to real life use cases, business users who want to consume some data, have an idea of what they want, and are not that focused on the current, available data and/or the current schema of the data.
The obvious first step was to design a draft of the dashboard that we wanted to build. We knew we wanted a world map, some information about the research (i.e. name, phase, etc.), the sponsor, and the timeline around it.
Once we had some draft requirements of the dashboard, the next step was to design the table(s) that would enable us to build this dashboard. We decided to go with a 360° view with the primary entity being “Clinical Trials.” A 360° table is a denormalized table where each line contains all the information about a specific entity. It has the advantage of making BI queries extremely easy and fast.
In Lore IO, we begin by modeling the final output. For that we created a “New Table” with all the columns that are required to build our dashboard. A new table is the template of a table that has a name and comprehensive columns defined, but has no data. It allows the ability to model the output that you want without worrying about the transformations needed to fill it.
The next step was to begin filling this table with some real data. We utilized the website clinicaltrials.gov, where official government data pertaining to current drug and vaccination trials is hosted. The law requires information regarding all federally and privately funded clinical trials be made public, as well as requiring the entity conducting the trial to provide the following information: the purpose of each experimental drug, basic information detailing the results of the trial, and any adverse events.
For this, we downloaded all the COVID-19 related data from clinicaltrials.gov in CSV format (https://clinicaltrials.gov/ct2/results?cond=COVID-19). We then easily created a new table based on the CSV file downloaded.
Once the data was imported, our next step was to “link” it to our previously created new table. Because the SearchResults table was already at the clinical trials level, we were able to define our Clinical Trials table as a map table on top of the SearchResults. A map table is constructed by unifying (in the SQL-sense) one or more of the input tables’ records.
Once the table got mapped, we had to decide how to construct the different columns. Most of the columns were standard map columns that did not require any transformations. For instance, the SearchResults columns drugs, completion_date, and phases are exactly the information that we want in the Clinical Trials columns Drugs, End Date, and Phase. The Map Columns function is used to take the values of one or more columns from one or more tables to another. Map Columns is similar to a COALESCE statement in SQL.
We did this in two steps, the first one was to use our map columns recommendation. Out of the nine columns that we had in our final table, six got recommended. Map columns recommendation is a feature that compares your current column and column metadata (name, description, type, values, …) to find potential candidates in the underlying mapped table(s).
The second step was to do the manual mapping of missing columns.
After doing this, we were missing only the “Country” column. This column presents two challenges: the first is to extract the country from the “Locations” column and the second is to normalize the values. Lore IO is a platform that extends SQL and enables us to write custom SQL code to create a specific function.
Once this was done, we wanted to ensure that all the countries were normalized, meaning if some clinical trials use ‘USA’ and others ‘United States,’ we wanted both of the values to end up as ‘United States.’ For this purpose, we used the Map values function. The Map values function is used when you want to update column data that may be incomplete, incorrect or improperly formatted. It will go through all of the data in the source column and map the source value(s) to the target value.
So, we then defined our “Country” column to be a Map values column that uses the “country not-normalized” column as an input. We used an online list with all the country names as targets and used the “Map values recommendation” to map most of the values of the clinical trial data to targeted values. The Map values recommendation is a feature that compares current target values with all values available in the input column. It will then compare and score them to provide the best recommendations.
Lore IO technology compiles into SQL, which means that after defining any transformation, it is possible to see the result without conducting any long running job or pipeline. So, we then went to our explore page and visualized the results of the transformations.
To briefly recap, I demonstrated how our tool produced a 360° table with normalized data using a top-down approach, which made it simple to build a dashboard on top of it. This particular use case is a bit simplified for the purpose of this blog, but we will present in more detail how to enrich data with other sources and how to conduct more complex transformations and aggregations in a future article.
Interact with the COVID-19 Drugs and Vaccines Dashboard.