How has energy production and consumption evolved over the years in the USA? Are we becoming more green? What external factors have been impacting our energy numbers? Those are just three out of many interesting questions to answer. Lucky for us, we have access to publicly available data sources from the U.S. Energy Information Administration (EIA) to help us glean insights.
In this blog post, I describe how to utilize those data sources to answer the aforementioned questions by leveraging the Lore IO platform to transform the data into a simplified version and easily create smart and dynamic dashboards from it: U.S. Energy Production and Consumption Dashboard.
Use Case
I sourced three original data sources and unified them into a big table that I imported into Tableau. From there, I created insightful dashboards to compare energy production and consumption across states and visualize how U.S. energy evolves over time.

Dictionary - msn description: Codes_and_Descriptions.xlsx, sheet MSN descriptions, 714 records.
Dictionary - states: Codes_and_Descriptions.xlsx, sheet State Code, 52 records.
Complete_SEDS_update:Complete_SEDS.csv, 1.8 million records.
Consolidate_SEDS_update


Dictionary - msn description


Dictionary - states


Reshape Data in Lore IO
Using the sources listed above, I followed these steps to prepare the data:

Step 1: Consolidate Sources
msn and data fields alone are not very useful. This is why I created data_sheet which consolidates the information from the sources; mapping Consolidate_SEDS_update (M) and creating relationships with Dictionary - msn description (1) and Dictionary - states (2):

Both relationships have checked Linked Lookup, so I could look for their columns and map them into Data_sheet table, when the Join Conditions are reached:
- msn from input table Complete_SEDS_update must match with msn in Dictionary - msn description
- statecode from input table Complete_SEDS_update must match with statecode in Dictionary - states
After generating the input table and relationships, I created the following columns in Data_sheet:


Step 2: Isolate Data
The Data_sheet table contains information that is shown in 10 columns with more than 1.8 million records (rows). Every record in Data_sheet has data in a specific unit (e.g. it is ‘Thousand’ when we refer to ‘Population’ and ‘Billion BTU’ when we reference ‘Total Energy’ data). This characteristic can easily trigger some confusion and mistakes. This is why I split the information into four smaller tables that I use in Step 3, where I create a new table Data Reshaped, in which data related to Population, GDP, Production and Consumption are shown in columns.
Population Table: Only records related to population data.

State and Year columns map state and year from the input table Data_sheet, because they are already represented as columns. On the other hand, Population is not a column of Data_sheet, but I added it in the Population Table using the Any function:


Real GDP Table: Only records related to Gross Domestic Product.

Similar to Population Table, State and Year columns map state and year from Data_sheet, while Real GDP selects Any value from column data in Data_sheet, after reaching some conditions:


Total Production: Only records that contain total production information.


Total Consumption: Only records that contain total consumption information.


Step 3: Reshape Data
The Data Reshaped table is a subset of Data_sheet. It removes records with Population and GDP data, since I can get them as columns after Lookup Relationships.

Columns mapped from Data_sheet

Columns mapped from Population Table and Real GDP Table

Columns mapped from Total Consumption and Total Production

Other columns defined by using conditions or calculations




Step 4: Aggregate Data
I aggregated the information of Reshape_data, so I could have only one row for triplet State, Energy Name, and Year. Also, in this table I only need rows which have Production or Consumption data.

Beside the three key columns, I created 19 columns that contain information related to Consumption, Production, Real GDP, and Population.



Import Lore IO table into Tableau
Once I created Aggregated Data, I explored its columns and downloaded a file with all the records.

I exported the downloaded file into Tableau and I created four interactive dashboards U.S. Energy Production over the years (Billion BTU) is one of them:

Summary
Lore IO allowed me to mix and reshape the data from three different files (sources) into the table Aggregated Data, which contains only the necessary information needed to create interesting, dynamic dashboards in Tableau.
After uploading the sources into Lore IO, I consolidated the data into a table by creating relationships among tables. As a result, I created the data_sheet table, which contains SEDS rows, plus the meaning of the MSN code and the names of the states. Then, I created smaller tables that encapsulate data based on the unit and type of information: Population, GDP, Production, and Consumption. I used them to reshape the data_sheet in Data Reshaped, in which some of the data from data_sheet rows was put into columns.
Aggregated Data is the optimized version of Data Reshaped; it aggregates the information in triplets State, Energy Name and Year, and contains only data related to Consumption and Production.
Visit the U.S. Energy Production and Consumption Dashboard.
Notes
MSN selected to get Production information.

MSN selected to get Consumption information



kWh to BTU conversion
1 kWh = 3412.14163312794 BTU