Sunday, January 22, 2017

Diving deep into data analytics with Power BI: Data Preparation with Query Editor and More Earthquake Analytics

In this article, we will make use of the same dataset that we used for Diving deep into data analytics with Power BI: Map Visualization In Action, i.e., Magnitude 6+ Earthquakes : Resources | Tableau Public. In this tutorial, I will make use of Query Editor for data preparation. For loading the data, follow the steps explained in Diving deep into data analytics with Power BI: Map Visualization In Action, however, once you reach the Navigator window/dialog as shown in Figure 2 of referenced blog, click Edit button. Before moving forward, save the source file.
As a first step remove the not-required columns by right clicking on each column and selecting Remove from the pop-up menu. I left only following mentioned columns:

time: Date and time of earth quakes
latitude: Latitude of earth quake location
longitude: Longitude of earth quake location
depth: Depth of the hypocentre of earth quake
mag: Magnitude of the earth quake
place: Place of the earth quake with reference from nearest city and country

For extracting the location information from the place column follow the following steps. Open the Query Editor for your report. Select the place column and then click on Split Column button available in Transform tab of the toolbar. The Split Column has two options, either to split by "By Delimiter" or "By Number of Characters". Select the "By Delimiter option". Please refer to Figure 1 for precise visuals.



Figure 1: Split Column Feature
Select the custom delimiter option and add " of " as the delimiter. Also select the "At the right-most delimiter option". The Split Column dialog is shown in Figure 2. Click Ok to continue. This will add a new column with title place.2.


Figure 2: Using Split Column Feature
Now you can observe that new column still has null values, as few of the place data fields have no " of " delimiter. In this scenario, one possible solution will be to make use of original data value of place field. We can get this done using Conditional Column feature. This feature can be found in Add Column toolbar. Click on the Conditional Column feature and set the rules for new column as shown in Figure 3. Click OK to continue.


Figure 3: Using Conditional Column Feature
Now you can observe that you have data for every record in your newly created citycountry column. As a next step we will start working with time field. We are interested in separating the year and month information as a separate column form the time field. Start with creating the duplicate of the time column. Rename each column to year and month respectively. Move both column to the start to get the better visualization of both columns. The Query Editor should look like as shown in Figure 4.


Figure 4: Adding Duplicate Column

For year column, right click on the column and select the Transform -> Year -> Year as shown in Figure 5. Repeat the same procedure for month column and instead of Year, select Month. Save the changes. Query Editor only apply changes to data when we save it.



Figure 5: Transforming Date to Date Components
We are now ready to start creating our analytical reports. First question that I would like to answer is "Is it true that the frequency of Earth quakes has increased over last century?". To answer this question, I will make use of Line chart ploting number of earthquakes measured over the years. To make our visualization with more accurate labels, we created a new measure with following expression:

[No of Earthquakes] = count(query[Year])


The report in the designer view should look like as shown in Figure 6.



Figure 6: Line Graph for Year-wise Earthquakes Frequency
Once the Line Graph visualization is complete, it can be published on the web. The details about how to publish the Power BI report on the web refer to Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data. The Line Graph visualization for year-wise earthquakes frequency is available at Year-wise Earthquakes Frequency. It is also embedded on this blog below for inline view.



It can easily be observed that it is true that frequency of earthquakes has increased alarmingly during last one century. How about analyzing the earthquakes frequency at distinct places each year, i.e., ignoring repeated earth quakes at same place. For this purpose, all you have to do is to change the expression of measure as identified below:

No of Earthquakes = distinctcount(query[citycountry])


The Line Graph visualization for year-wise frequency of earthquakes at distinct locations will look like as embedded on this blog below for inline view.




It can easily be observed that the number of locations for earthquakes also increased alarmingly during last one century. What about identifying association of earthquakes with certain months. The question will be "Are there few months who are more vulnerable for earthquakes?". For this question, all we need to do it to change the year field with the month field and choose the appropriate visualization. Line Graph visualization will not make sense of month as figures of month not will not be representing the relationship among months, i.e., count of earthquakes in March as nothing to do with count of earthquakes in February. For month-wise earthquake analysis Stacked bar chart, Stacked column chart, Clustered bar chart, and Clustered column chart can be used. For changing the visualization type, all you have to do is to select the visualization in designer and then select the new visualization type from the list. The Clustered column chart visualization for Month-wise Frequency of Earthquakes is embedded on this blog below for inline view.



It can be observed that there is no significant difference among occurrence of earthquakes across months.

To sum-up, this tutorial introduced the reader with use of Query Editor for data pre-processing/preparation as well as few more visualization types, such as Line Chart and Clustered column chart. 
The next article of this series can be accessed at Diving deep into data analytics with Power BI: Pie chart Visualization and use of Unpivot featureThe previous article of this series can be accessed at Diving deep into data analytics with Power BI: Map Visualization In Action. Similarly, the first article of this series on Power BI analytics can be accessed at Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data.