Sunday, January 22, 2017

Diving deep into data analytics with Power BI: Map Visualization In Action

In this article, we will be using a very interesting dataset about earthquakes. The dataset contains the information about magnitude 6+ earthquakes from 1900 - 2013 taken via United States Geological Survey (USGS). We downloaded this dataset from Magnitude 6+ Earthquakes : Resources | Tableau Public. The dataset description says "All recorded earthquakes with a magnitude of 6 or greater from 1900 - 2013 via USGS (United States Geological Survey)". It has following attributes/columns:

time
Date
Time
latitude
longitude
depth
mag
magType
nst
gap
dmin
rms
net
id
updated
place
type

However, we will confine our focus on selected columns related to time, space/location, and magnitude of the earth quake. For this purpose, we will only be using following columns"

time: Date and time of earth quakes
latitude: Latitude of earth quake location
longitude: Longitude of earth quake location
depth: Depth of the hypocentre of earth quake
mag: Magnitude of the earth quake
place: Place of the earth quake with reference from nearest city and country

This data will need significant pre-processing for more productive results, but lets start with a quick visualization of the data in its existing form and let see how it works and how is the visualization. Just start your Power BI Desktop. Click on the Get Data and Select Excel as shown in Figure 1. Select the MS Excel file and then select the data sheet with data as shown in Figure 2. Click on the load button and after successful loading of data, don't forget to save your file Power BI file.


Figure 1: Get Data from MS Excel
Figure 2: Selecting the Data Sheet with Data
Figure 3: Instant Map Visualization in Power BI

Please start again and repeat the same steps until what is show in Figure 2. Instead of directly loading the data, click Edit. This will open the Query Editor window for you. Remove all unnecessary columns. Apply the changes and close the Query Editor. Don't forget to save the report. Now this time, after placing latitude and longitude columns into Latitude and Longitude place holders, do also add mag column into Size place holder meaning the higher is the magnitude of the earthquake the bigger will be the circle. Few of you might argue, why haven't we done the same earlier. Yes, we could have, but the intentional purpose was to convey that never over-rely on tool capabilities. The updated Map visualization in design mode should look like as shown in Figure 4.

Figure 4: Map Visualization

Wait, just look at the updated Map and ask yourself, is it correct? This is often the case that BI developers miss out the important validation part of their visualizations just because they seem to look correct. Now what is the problem, just refer to articles Tips and Tricks for Power BI Map visualizations and HOW TO OVERCOME MAP RELATED ISSUES IN POWER BI, POWER VIEW AND POWER MAP to understand the problem and one of the possible solution. Just observe the latitude, longitude, and mag fields. They have sigma sign in front of them, which means that these fields have been aggregated. And you can observe in the article that we must add city information in addition to latitude and longitude, or otherwise our latitude and longitude data will be aggregated at the state level. Oh, yes, this is a real life of a BI developer. So what's next? Its data pre-processing. We have the information of nearest city available in place column that can be used to increase the accuracy of our map visualization.


For extracting the location information from the place column follow the following steps. Open the Query Editor for your report. Select the place column and then click on Split Column button available in Transform tab of the toolbar. The Split Column has two options, either to split by "By Delimiter" or "By Number of Characters". Select the "By Delimiter option". Please refer to Figure 5 for precise visuals.

Figure 5: Split Column Feature
Figure 6: Using Split Column Feature
Figure 7: Using Conditional Column Feature
Figure 8: Map visualization with Location, Average Latitude, and Average Longitude

We will start with Map visualization available in Power BI. For simplicity, just make use of latitude and longitude columns, placing them into Latitude and Longitude place holders for Map visualization. Please do also format the Map visualization and add appropriate title. The visualization in designer mode should look like as shown in Figure 3. Don't forget to save you report. Beware that Map visualization make use of Bing map and thus brings-in its all related benefits and issues.



Once the Map visualization is complete, it can be published on the web. The details about how to publish the Power BI report on the web refer to Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data. The Map visualization of this tutorial is available at Earthquakes Analysis. It is also embedded on this blog below for inline view.


Just look at our first map visualization and ask yourself. Is it correct? Is it possible that Earthquake never happened in Japan? Now get back to your data and check the fifth row using the default data order. It says "15km NNE of Kunisaki-shi, Japan". This is what we should be cautious with. Tools are their to support us in data analytics, but we should be well prepared to use them correctly.



Once the Map visualization is complete, it can be published on the web. The details about how to publish the Power BI report on the web refer to Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data. The second updated Map visualization of this tutorial is available at Earthquakes Analysis Updated. It is also embedded on this blog below for inline view.





Select the custom delimiter option and add " of " as the delimiter. Also select the "At the right-most delimiter option". The Split Column dialog is shown in Figure 6. Click Ok to continue. This will add a new column with title place.2.


Now you can observe that new column still has null values, as few of the place data fields have no " of " delimiter. In this scenario, one possible solution will be to make use of original data value of place field. We can get this done using Conditional Column feature. This feature can be found in Add Column toolbar. Click on the Conditional Column feature and set the rules for new column as shown in Figure 7. Click OK to continue.


Now you can observe that you have data for every record in your newly created citycountry column. Apply and close the query editor. For your report, add newly created citycountry column in Location place holder for Map visualization and in parallel change the aggregate level of Latitude and Longitude fields to average. Your updated Map visualization in designer should look like as it is shown in Figure 8.


Once the location based averaged out Map visualization is finalized, it can be published on the web. The details about how to publish the Power BI report on the web refer to Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data. The third updated Map visualization of this tutorial is available at Earthquakes Analysis Location Based Average. It is also embedded on this blog below for inline view.


I still believe that this Map visualization still has lot of room for improvement, however, considering the extended length of this article, we will stop at this point. To sum-up, this tutorial introduced the reader with use of Map visualization correctly and related Power BI features of Query Editor that can be handy for solving many BI problems. The next article in this series can be found at Diving deep into data analytics with Power BI: Data Preparation with Query Editor and More Earthquake AnalyticsThe previous article of this series can be accessed at Diving deep into data analytics with Power BI: Aggregate based Visualizations using MS Excel data. Similarly, the first article of this series on Power BI analytics can be accessed at Diving deep into data analytics with Power BI: Tabular Visualization using MS Excel data.