Friday, March 15, 2013

OLAP Query Languages

For data warehousing, two widely used storage models are 1) Relational and 2) Multidimensional. Irrespective of what ever storage model you use, you must have a query processing language and components to execute these queries to bring you results. If you are working with relational databases for data warehousing. Use of SQL query language is a must. For this purpose, SQL 99 specification has added few features in SQL to facilitate OLAP operations on relational databases. For example, consider following query that calculates order and sales according to country, product, and shipment date using the AdventureWorksDW database FactInternetSales tables. I restricted to data output to keep the resultset concise and readable.

SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName




Above query need three joins and two aggregations to return require results. Now if I need the subtotals for each year, for each product, and for each country as well as the grand total. I have to re-write this query as:

SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
UNION
SELECT DST.SalesTerritoryCountry
,'COUNTRYSUBTOTAL'
,'COUNTRYSUBTOTAL'
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany')
group by
DST.SalesTerritoryCountry
UNION
SELECT 'YEARSUBTOTAL'
,DT.CalendarYear
,'YEARSUBTOTAL'
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
WHERE
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004')
group by
DT.CalendarYear
UNION
SELECT 'PRODUCTSUBTOTAL'
,'PRODUCTSUBTOTAL'
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
DP.EnglishProductName
UNION
SELECT 'GRANDTOTAL'
,'GRANDTOTAL'
,'GRANDTOTAL'
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS


You can observe the complexity of the query. This resulted in addition of few features in SQL-99, which allows us to get the same results using much simpler queries. GROUPING SETS is one of those features. Following query make user of GROUPING SETS feature:

SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
GROUPING SETS(
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName)


Above mentioned query will generate three groups, i.e., Country, Product and Year. If we also need the calculations on combinations of these three attributes. GROUP BY ROLLUP feature comes into play. The query mentioned below will return the same result as for our four union query above:

* Before executing ROLLUP or CUBE, please make sure that you have set the compatibility level of your database to 100 or use following query:
ALTER DATABASE AdventureWorksDW set compatibility_level = 100


SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
ROLLUP (
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName)


Please do remember that order is significant to ROLLUP. GROUP BY ROLLUP (Year, Product, Country)  is equivalent to GROUPING SETS of (Year, Product, Country), (Year, Product), (Year), (All). This means that n-elements of ROLLUP translate into n+1 grouping sets.

If we move ahead and plan to calculate the all possible combinations of attributes, i.e., including (Year, Country), (Product, Country), (Product), and (Country), which were missing in GROUP BY ROLLUP output. We can make use of GROUP BY CUBE. All we need to do is to change the ROLLUP keyword with CUBE keyword.

SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
CUBE (
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName)


Another feature, which is not part of SQL99 standard, but can be very handy is NTILE. It allows us to split our resultset into equal groups. Use of NTILE is demonstrated in query below:

SELECT DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName
, SUM([OrderQuantity]) as TORDER
, SUM([SalesAmount]) as TSALES
, NTILE(3) OVER (
ORDER BY DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName) as BUCKETNO
FROM [AdventureWorksDW].[dbo].[FactInternetSales] FIS
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] DST
ON
FIS.[SalesTerritoryKey] = DST.[SalesTerritoryKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimTime] DT
ON
FIS.[ShipDateKey] = DT.[TimeKey]
LEFT OUTER JOIN [AdventureWorksDW].[dbo].[DimProduct] DP
ON
FIS.[ProductKey] = DP.[ProductKey]
WHERE
(DST.SalesTerritoryCountry = 'Australia' or DST.SalesTerritoryCountry = 'Germany') and
(DT.CalendarYear = '2003' or DT.CalendarYear = '2004') and
(DP.EnglishProductName = 'All-Purpose Bike Stand' or DP.EnglishProductName = 'Road-550-W Yellow, 44')
group by
CUBE (
DST.SalesTerritoryCountry
,DT.CalendarYear
,DP.EnglishProductName)



MultiDimensional eXpression (MDX)
On the other-hand, for mutidimensional databases. We need to make use of MultiDimensional eXpression (MDX). Developed by Microsoft and later adopted by all major vendors, MDX is a very useful tool to query multidimensional databases. It has three basic constructs, i.e., SELECT, FROM, and WHERE similar to SQL. SELECT is used to specify axis dimensions on columns and rows, FROM is used to specify CUBE/s we want to use for data retrieval, and WHERE is used to restrict the data area. Below is a simple MDX query. To execute MDX query, you should connect to SQL Server Analysis Services using SQL Server Management Studio. After login, select the appropriate database and right click. Select the New Query -> MDX options. This will open a new query windows for writing MDX queries.

SELECT
[Dim Product].[Product] ON ROWS
, [Dim Sales Territory].[Country] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]


Above query returns the sales quantity for all products in all regions. If we want to view the sales quantity for each individual country and product, we have to specify the members keyword for each dimension as show below:

SELECT
[Dim Product].[Product].members  ON ROWS
, [Dim Sales Territory].[Country].members ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]


Both of above-mentioned DMX queries listed the sales quantity measure for our cube. If we want to view the sales amount, we have to mention is explicit. Otherwise default measure is selected for calculation. Below DMX query calculates [Sales Amount] measure.

SELECT
[Dim Product].[Product].members  ON ROWS
, [Dim Sales Territory].[Country].members ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Measures].[Sales Amount])



If you want to select specific member of dimension. You can specify it in SELECT clause or WHERE clause. In DMX query below, we select only the sales for Australia:


SELECT
[Dim Product].[Product].members  ON ROWS
, [Dim Sales Territory].[Country].&[Australia] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Measures].[Sales Amount])


SELECT
[Dim Product].[Product].&[AWC Logo Cap]  ON ROWS
, [Dim Sales Territory].[Country].&[Australia] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Measures].[Sales Amount])

In DMX query below, we select only the sales for Australia for AWC Logo Cap product:


DMX handles Measures as special dimensions. They hold numerical values only and do not contain any concept hierarchy. Therefore, Both dimension and measures can be used in both SELECT and WHERE clauses alternatively.

 SELECT
[Ship Date].[Year].&[2004]  ON ROWS
, [Measures].[Order Quantity] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Dim Product].[Product].&[AWC Logo Cap]
, [Dim Sales Territory].[Country].&[Australia])


We can also make use of tuples to restrict our required data from cube. In query below, ([Dim Product].[Product].&[AWC Logo Cap], [Dim Sales Territory].[Country].&[Australia]) represent a tuple.

SELECT
[Ship Date].[Year].members  ON ROWS
, [Measures].[Order Quantity] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Dim Product].[Product].&[AWC Logo Cap]
, [Dim Sales Territory].[Country].&[Australia])



IF you want to view all measures then members keyword can be used with measures to list all measures as show in query below:

SELECT
[Ship Date].[Year].members  ON ROWS
, [Measures]. members ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
WHERE ([Dim Product].[Product].&[AWC Logo Cap]
, [Dim Sales Territory].[Country].&[Australia])



DMX also gives up provision to make use of multiple tuples as Set. A set is used to define axis on rows in query below:

SELECT
{([Dim Product].[Product].&[AWC Logo Cap], [Dim Sales Territory].[Country].&[Australia])
, ([Dim Product].[Product].&[Adjustable Race], [Dim Sales Territory].[Country].&[Australia])} ON ROWS
, [Measures].[Order Quantity] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]



DMX is not only restricted to ROWS and COLUMNS for axis, instead we can also get PAGES, CHAPTERS, and SECTIONS for more than two axis as shown in query below: 

SELECT
[Ship Date].[Year].members ON ROWS
, [Due Date].[Year].members ON COLUMNS
, [Order Date].[Year].members ON PAGES
, [Dim Product].[Product].members ON CHAPTERS
, [Dim Sales Territory].[Country].members ON SECTIONS
FROM
[Adventure Works DW InternetSales Cube]

Furthermore, DMX also allows us to work without defining any axis. For example, the query below has not axis. This query is legal and it works. It returns the overall summary for our cube measures.

SELECT
FROM
[Adventure Works DW InternetSales Cube]

DMX also support sub-queries as shown in query below:

SELECT
([Dim Product].[Product].members, [Dim Sales Territory].[Country].members) ON ROWS
, [Measures].[Order Quantity] ON COLUMNS
FROM(
SELECT
{([Dim Product].[Product].&[AWC Logo Cap], [Dim Sales Territory].[Country].&[Australia])
, ([Dim Product].[Product].&[Adjustable Race], [Dim Sales Territory].[Country].&[Australia])} ON ROWS
, [Measures].[Order Quantity] ON COLUMNS
FROM
[Adventure Works DW InternetSales Cube]
)



DMX also allows us to define new measures or calculated measures using the existing measures. We can define these new measures using WITH MEMBER keywords as shown in figure below:

WITH
MEMBER [Measures].[Sales Diff] as
( [Measures].[Sales Amount] / [Measures].[Order Quantity])
SELECT
[Dim Product].[Product].members ON 0 --Column
, [Measures].[Sales Diff] ON 1 --Rows
FROM
[Adventure Works DW InternetSales Cube]