Thursday, February 21, 2013

Building Data Cubes: An Introduction to SQL Server Analysis Services

SQL Server Analysis Services is a good option for experts working on data warehousing and business intelligence technologies, specifically OLAP and Data Mining. SQL Server Analysis Services provide you features to design, create, and manage multi-dimensional data cubes using any of your favorite data sources. Data Mining feature is also available, and it is also very good, but my focus in this blog will be OLAP using SQL Server Analysis Services. I prefer RapidMiner for my data mining tasks :-). To start working with SQL Server Analysis Services, it must be installed in your computer along with client components. This tutorial uses SQL Server 2008. The most important component for you will be SQL Server Business Intelligence Development Studio and SQL Server Analysis Services. You can find SQL Server Business Intelligence Development Studio in your programs menu as show in figure below:


After starting the development studio, go to File -> New -> Project menu. You will find many different project types as show in figure below. For OLAP purpose, Analysis Services Project is of interest for you. Please select the appropriate folder and name for your project.



Once you are done with project selection, you will end up at a screen similar to the one show in the figure below:


On the right side of the studio, you can view a Solution Explorer. In Solution Explorer, there is a project name MyOLAP, which we have just created. The folders under the MyOLAP project are important. Each of them have significant meaning and purpose. For our OLAP task, we will start with Data Sources. Data Source folder contains the list of Data Sources that we add to our project and we intend to use for analysis. Here you can add any data source for which you have driver available in your computer for database connectivity. Just right click on the Data Sources folder and click New Data Source. This bring you the Data Source Wizard.


As you move next, you can view the list of existing Data Connections. You can use existing ones, or  can create a new one using the new button.


To create a new database connection, you have to specify the provider that you want to use for database connectivity. For example, in the figure below, I am using SQL Server Native Client to connect to local SQL Server on my computer. Provide the server name, user id, and password. If you have provided correct connection information, then you can view the list of existing databases  in the combo of "Select or enter a database name". You can select the appropriate database or write its name.


Once you are done with your database connection creation, your newly created connection will be in the list of Data Connections as shown in the figure below.


As you move next, wizard will ask you for Impersonation Information. This information is used to define the Windows credentials that Analysis Services will use to connect to the data source. More details can be found at Set Impersonation Options (SSAS - Multidimensional). I have used the same credentials as of current user. However, you can also use a specific account for your project.


At the end, wizard ask you to specify a data source name to finish.


As you finish the wizard, you can see the Data Source beneath Data Sources folder.


The second step will be to create Data Source Views. Data Source Views allow us to select database object of our interest from the Data Source. A Data Source may contain many tables or views. Using Data Source Views we select the objects that we intend to use in our project. Right click on Data Source Views and click New Data Source View. This will bring you the Data Source View Wizard.


As you move next, you can view the data source that you have created in previous step.


On the next screen, you can view the objects available at your selected data source. This view allows you to select the objects of your interest. For example, I have selected the FactInternetSales table from AdventureWorksDW database. This screen provides a very useful feature to auto-select the related tables. Simply click on Add Related Tables. Wizard will automatically add the related tables with FactInternetSales table. During definition of dimensions and cube in our project, only these tables will be visible/available to us.


On the next screen, provide a title for the view and finish the wizard.


This will bring a data source view beneath Data Source View folder. IDE also shows a diagram of tables with their inter-connectivity as shown in figure below.


Once you are done with creating the data source view, the next step will be to create dimensions for your cube. To create a dimension, right click on the Dimensions folder and click new Dimensions.


As you move next, wizard will ask you about the dimension creation method. As we are using the AdventureWorksDW database, we will be using the use an existing table option. For more information on other options, please refer to Select Creation Method (Dimension Wizard).


Now we have to select the main table for the dimension. For example, in the figure below, I have selected Product as dimension. A key column is mandatory as this key column will be used to connect/link dimension with the fact table.


On the next screen, you can select the attributes that you want to use in your dimension. For example, to display or to generate concept hierarchies. In the figure below, I have selected the English Product Name, which I will use to display this dimension during my OLAP analysis.



Once you are done, please specify a valid name for your dimension to finish.


All the dimensions that you create are available underneath the Dimensions folder in the solution explorer.


Please create all dimensions that are required for your Cube. For example, in this tutorial, we have created three dimensions, i..e, Product, Region, and Time. To create a data cube, right click on Cubes folder and click New Cube. This will bring you the cube creation wizard.


As you move next, this wizard will also ask you the cube creation method as was the case with dimensions creation. We will select the use existing tables option. For other creation methods, please refere to Select Creation Method (Cube Wizard).


As we move next, we can view the tables that we selected in our data source view. We will use the FactInternetSales as our fact table.


On next screen, wizard ask us to select the measure for our Cube. We have selected only two, but according to your requirements, you can select any of them.


On the next screen, wizard will ask you to select existing dimensions that you want to use in your cube. We have created three dimensions and we will be using all three of them.


On the next screen, wizard also allows us to create new dimensions, if required. We will skip this screen because we have already created dimensions of our interest.


On the next screen name the cube and finish the wizard.


After finishing the wizard, the IDE will display you the Data Source View of the cube in the middle of the screen as well as measures and dimensions on the left side of the window.


All you need to do is to process your cube. Just click the process icon, second from left on the tool bar or you can use Cube -> Process menu option. This will perform the necessary processing of cube using a wizard.


During the processing, you can observe the progress. As long as there is not red color, everything is fine.


Once cube is processed. You can move to browser tab to browse your cube data. You will only need to drag and drop the dimensions and measure in the white area in the middle.


As you drop each field, you will get preview of your data.Once all field are dropped then you can analyze the Internet Sales using multi-dimensional analysis as show in figure below.


Friday, February 15, 2013

Working with TPC-H benchmark

Benchmarks are important tools to generate "re-generate-able" results. It allows us to present reliable results that others can understand and believe. Researchers working on OLAP/read-intensive workload do make use of TPC-H benchmark to generate their results. Using TPC-H benchmark requires generation of schema and population of data so that one can execute TPC-H queries. Detail regarding TPC-H benchmark can be found at: TPC-H . Same link also provide a tool named DBGEN, which is used to generate dataset for the benchmarks. If you face problem building dbgen.exe from source code, you can use one from the DBGEN.exe link.  Below are few steps to generate TPC-H benchmarks schema and populate it with dataset.
  • Download the TPC-H dbgen and qgen source code from TPC-H webpage. It contains every thing you need to execute the TPCH benchmark.
  • dbgen.exe will be used to generate the data file.
  • To create schema, their exists a file with name dss.ddl. This file contains the script to generate the schema. You might need to alter it according to your DBMS syntax.
  • Once you create the schema, you can load the data from data files generated from dbgen utility. To generate the data files execute the command “dbgen –s 1”. This will generate the data files for scale factor 1, i.e., of 1 GB. What is scale factor? Scale factor identifies the size of the test database. Scale factors used for the test database must be chosen from the set of fixed scale factors defined as follows: 1, 10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000.
  • A separate file for each table in TPCH schema will be generated, such as supplier.tbl, customer.tbl, etc.
  • You will use the bulk load mechanism available in your DBMS for loading data, for example, we can use following commands to load TPCH data into SQL Server (change file paths according to your own file locations):
//Execute dss.ddl script to generate schema
BULK INSERT CUSTOMER FROM 'D:\customer.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT LINEITEM FROM 'D:\lineitem.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT NATION FROM 'D:\nation.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT ORDERS FROM 'D:\orders.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT PART FROM 'D:\part.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT PARTSUPP FROM 'D:\partsupp.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT REGION FROM 'D:\region.tbl' WITH (FIELDTERMINATOR = '|')
BULK INSERT SUPPLIER FROM 'D:\supplier.tbl' WITH (FIELDTERMINATOR = '|')
//Execute dss.ri script to create constraints
    • Following commands can be used to load TPCH data into MySQL:
    \. /usr/local/var/dss.ddl
    LOAD DATA LOCAL INFILE 'lineitem.tbl' INTO TABLE LINEITEM FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'nation.tbl' INTO TABLE NATION FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'region.tbl' INTO TABLE REGION FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'part.tbl' INTO TABLE PART FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'supplier.tbl' INTO TABLE SUPPLIER FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'partsupp.tbl' INTO TABLE PARTSUPP FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'customer.tbl' INTO TABLE CUSTOMER FIELDS TERMINATED BY '|';
    LOAD DATA LOCAL INFILE 'orders.tbl' INTO TABLE ORDERS FIELDS TERMINATED BY '|';
    \. /usr/local/var/dss.ri
    • You can observe that I have also executed the dss.ri script file after loading the data. This file is also located in dbgen folder and it contains the script to generate constraints on tables, such as primary key and foreign key. You might need to alter this file according to syntax of your selected DBMS. For SQL Server you can use the file available at dss.ri.
    • Once you complete this process, you can execute your first TPCH query. Queries are located in dbgen\queries folder. Total 22 queries are there. Few of them are parameterized, i.e., you have to provide the parameter values for them. And as I mentioned earlier, you might need to alter the syntax of queries according to your DBMS syntax.
    On SQL Server, please use the following query 1:

    SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
    SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE,
    SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY,
    AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER
    FROM LINEITEM
    WHERE L_SHIPDATE <= dateadd(dd, -90, cast('1998-12-01' as datetime))
    GROUP BY L_RETURNFLAG, L_LINESTATUS
    ORDER BY L_RETURNFLAG,L_LINESTATUS

    Syntax for query 1 on MySQL will be as follows:

    SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
    SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS
    SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE,
    AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE,
    AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM lineitem WHERE
    L_SHIPDATE <= date '1998-12-01' - interval '90' day GROUP BY L_RETURNFLAG,
    L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS;
    • In the above-mentioned query parameter :1 is set to 90. What should be the parameters values, and what will be their results is specified in TPCH specification document, available at TPCH site in pdf form.
    • You can also refer to following weblinks for more information:
      • http://blog.nhaslam.com/2011/07/02/fun-with-tpc-h-part-1/
      • http://www.sqlservercentral.com/blogs/sqlservernotesfromthefield/2012/01/24/generate-test-data-using-dbgen/
      • http://www.pilhokim.com/index.php?title=Project/EFIM/TPC-H
      • http://dsl.serc.iisc.ernet.in/projects/PICASSO/picasso_download/doc/Installation/tpch.htm
      • http://research.microsoft.com/en-us/um/people/gray/dbgen/
      • http://www.sqlserver-dba.com/2011/09/this-is-a-followup-on-my-earlier-post-of-sql-server-test-data-generation-testing-tools-i-had-some-requests-for-my-set-up-pr.html
    • If you face problem building dbgen.exe from source code, you can use one from the DBGEN.exe link.
    • All TPC-H queries in SQL Server format are given below. Source of these queries is: http://www.sqlserver-dba.com/2011/09/this-is-a-followup-on-my-earlier-post-of-sql-server-test-data-generation-testing-tools-i-had-some-requests-for-my-set-up-pr.html

    /* TPC_H Query 1 - */
    SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
     SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE,
     SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY,
     AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER
    FROM LINEITEM
    WHERE L_SHIPDATE <= dateadd(dd, -90, cast('1998-12-01' as datetime))
    GROUP BY L_RETURNFLAG, L_LINESTATUS
    ORDER BY L_RETURNFLAG,L_LINESTATUS

    /* TPC_H Query 2 - Minimum Cost Supplier */
    SELECT TOP 100 S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY, P_MFGR, S_ADDRESS, S_PHONE, S_COMMENT
    FROM PART, SUPPLIER, PARTSUPP, NATION, REGION
    WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY AND P_SIZE = 15 AND
    P_TYPE LIKE '%%BRASS' AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND
    R_NAME = 'EUROPE' AND
    PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER, NATION, REGION
     WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY
     AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'EUROPE')
    ORDER BY S_ACCTBAL DESC, N_NAME, S_NAME, P_PARTKEY

    /* TPC_H Query 3 - Shipping Priority */
    SELECT TOP 10 L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY
    FROM CUSTOMER, ORDERS, LINEITEM
    WHERE C_MKTSEGMENT = 'BUILDING' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND
    O_ORDERDATE < '1995-03-15' AND L_SHIPDATE > '1995-03-15'
    GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY
    ORDER BY REVENUE DESC, O_ORDERDATE 

    /* TPC_H Query 4 - Order Priority Checking */
    SELECT O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT FROM ORDERS
    WHERE O_ORDERDATE >= '1993-07-01' AND O_ORDERDATE < dateadd(mm,3, cast('1993-07-01' as datetime))
    AND EXISTS (SELECT * FROM LINEITEM WHERE L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE)
    GROUP BY O_ORDERPRIORITY
    ORDER BY O_ORDERPRIORITY
     
    /* TPC_H Query 5 - Local Supplier Volume */
    SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE
    FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION
    WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY
    AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY
    AND R_NAME = 'ASIA' AND O_ORDERDATE >= '1994-01-01' 
    AND O_ORDERDATE < DATEADD(YY, 1, cast('1994-01-01' as datetime))
    GROUP BY N_NAME
    ORDER BY REVENUE DESC

    /* TPC_H Query 6 - Forecasting Revenue Change */
    SELECT SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUE
    FROM LINEITEM
    WHERE L_SHIPDATE >= '1994-01-01' AND L_SHIPDATE < dateadd(yy, 1, cast('1994-01-01' as datetime))
    AND L_DISCOUNT BETWEEN .06 - 0.01 AND .06 + 0.01 AND L_QUANTITY < 24

    /* TPC_H Query 7 - Volume Shipping */
    SELECT SUPP_NATION, CUST_NATION, L_YEAR, SUM(VOLUME) AS REVENUE
    FROM ( SELECT N1.N_NAME AS SUPP_NATION, N2.N_NAME AS CUST_NATION, datepart(yy, L_SHIPDATE) AS L_YEAR,
     L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME
     FROM SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION N1, NATION N2
     WHERE S_SUPPKEY = L_SUPPKEY AND O_ORDERKEY = L_ORDERKEY AND C_CUSTKEY = O_CUSTKEY
     AND S_NATIONKEY = N1.N_NATIONKEY AND C_NATIONKEY = N2.N_NATIONKEY AND  ((N1.N_NAME = 'FRANCE' AND N2.N_NAME = 'GERMANY') OR
     (N1.N_NAME = 'GERMANY' AND N2.N_NAME = 'FRANCE')) AND
     L_SHIPDATE BETWEEN '1995-01-01' AND '1996-12-31' ) AS SHIPPING
    GROUP BY SUPP_NATION, CUST_NATION, L_YEAR
    ORDER BY SUPP_NATION, CUST_NATION, L_YEAR

    /* TPC_H Query 8 - National Market Share */
    SELECT O_YEAR, SUM(CASE WHEN NATION = 'BRAZIL' THEN VOLUME ELSE 0 END)/SUM(VOLUME) AS MKT_SHARE
    FROM (SELECT datepart(yy,O_ORDERDATE) AS O_YEAR, L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME, N2.N_NAME AS NATION
     FROM PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION N1, NATION N2, REGION
     WHERE P_PARTKEY = L_PARTKEY AND S_SUPPKEY = L_SUPPKEY AND L_ORDERKEY = O_ORDERKEY
     AND O_CUSTKEY = C_CUSTKEY AND C_NATIONKEY = N1.N_NATIONKEY AND
     N1.N_REGIONKEY = R_REGIONKEY AND R_NAME = 'AMERICA' AND S_NATIONKEY = N2.N_NATIONKEY
     AND O_ORDERDATE BETWEEN '1995-01-01' AND '1996-12-31' AND P_TYPE= 'ECONOMY ANODIZED STEEL') AS ALL_NATIONS
    GROUP BY O_YEAR
    ORDER BY O_YEAR

    /* TPC_H Query 9 - Product Type Profit Measure */
    SELECT NATION, O_YEAR, SUM(AMOUNT) AS SUM_PROFIT
    FROM (SELECT N_NAME AS NATION, datepart(yy, O_ORDERDATE) AS O_YEAR,
     L_EXTENDEDPRICE*(1-L_DISCOUNT)-PS_SUPPLYCOST*L_QUANTITY AS AMOUNT
     FROM PART, SUPPLIER, LINEITEM, PARTSUPP, ORDERS, NATION
     WHERE S_SUPPKEY = L_SUPPKEY AND PS_SUPPKEY= L_SUPPKEY AND PS_PARTKEY = L_PARTKEY AND
     P_PARTKEY= L_PARTKEY AND O_ORDERKEY = L_ORDERKEY AND S_NATIONKEY = N_NATIONKEY AND
     P_NAME LIKE '%%green%%') AS PROFIT
    GROUP BY NATION, O_YEAR
    ORDER BY NATION, O_YEAR DESC

    /* TPC_H Query 10 - Returned Item Reporting */
    SELECT TOP 20 C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, C_ACCTBAL,
    N_NAME, C_ADDRESS, C_PHONE, C_COMMENT
    FROM CUSTOMER, ORDERS, LINEITEM, NATION
    WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE>= '1993-10-01' AND
    O_ORDERDATE < dateadd(mm, 3, cast('1993-10-01' as datetime)) AND
    L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY
    GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT
    ORDER BY REVENUE DESC

    /* TPC_H Query 11 - Important Stock Identification */
    SELECT PS_PARTKEY, SUM(PS_SUPPLYCOST*PS_AVAILQTY) AS VALUE
    FROM PARTSUPP, SUPPLIER, NATION
    WHERE PS_SUPPKEY = S_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'GERMANY'
    GROUP BY PS_PARTKEY
    HAVING SUM(PS_SUPPLYCOST*PS_AVAILQTY) > (SELECT SUM(PS_SUPPLYCOST*PS_AVAILQTY) * 0.0001000000
     FROM PARTSUPP, SUPPLIER, NATION
     WHERE PS_SUPPKEY = S_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'GERMANY')
    ORDER BY VALUE DESC

    /* TPC_H Query 12 - Shipping Modes and Order Priority */
    SELECT L_SHIPMODE,
    SUM(CASE WHEN O_ORDERPRIORITY = '1-URGENT' OR O_ORDERPRIORITY = '2-HIGH' THEN 1 ELSE 0 END) AS HIGH_LINE_COUNT,
    SUM(CASE WHEN O_ORDERPRIORITY <> '1-URGENT' AND O_ORDERPRIORITY <> '2-HIGH' THEN 1 ELSE 0 END ) AS LOW_LINE_COUNT
    FROM ORDERS, LINEITEM
    WHERE O_ORDERKEY = L_ORDERKEY AND L_SHIPMODE IN ('MAIL','SHIP')
    AND L_COMMITDATE < L_RECEIPTDATE AND L_SHIPDATE < L_COMMITDATE AND L_RECEIPTDATE >= '1994-01-01'
    AND L_RECEIPTDATE < dateadd(mm, 1, cast('1995-09-01' as datetime))
    GROUP BY L_SHIPMODE
    ORDER BY L_SHIPMODE

    /* TPC_H Query 13 - Customer Distribution */
    SELECT C_COUNT, COUNT(*) AS CUSTDIST
    FROM (SELECT C_CUSTKEY, COUNT(O_ORDERKEY)
     FROM CUSTOMER left outer join ORDERS on C_CUSTKEY = O_CUSTKEY
     AND O_COMMENT not like '%%special%%requests%%'
     GROUP BY C_CUSTKEY) AS C_ORDERS (C_CUSTKEY, C_COUNT)
    GROUP BY C_COUNT
    ORDER BY CUSTDIST DESC, C_COUNT DESC

    /* TPC_H Query 14 - Promotion Effect */
    SELECT 100.00* SUM(CASE WHEN P_TYPE LIKE 'PROMO%%' THEN L_EXTENDEDPRICE*(1-L_DISCOUNT)
    ELSE 0 END) / SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS PROMO_REVENUE
    FROM LINEITEM, PART
    WHERE L_PARTKEY = P_PARTKEY AND L_SHIPDATE >= '1995-09-01' AND L_SHIPDATE < dateadd(mm, 1, '1995-09-01')

    /* TPC_H Query 15 - Create View for Top Supplier Query 
    CREATE VIEW REVENUE0 (SUPPLIER_NO, TOTAL_REVENUE) AS
    SELECT L_SUPPKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM LINEITEM
    WHERE L_SHIPDATE >= '1996-01-01' AND L_SHIPDATE < dateadd(mm, 3, cast('1996-01-01' as datetime))
    GROUP BY L_SUPPKEY
    GO*/
    /* TPC_H Query 15 - Top Supplier */
    SELECT S_SUPPKEY, S_NAME, S_ADDRESS, S_PHONE, TOTAL_REVENUE
    FROM SUPPLIER, REVENUE0
    WHERE S_SUPPKEY = SUPPLIER_NO AND TOTAL_REVENUE = (SELECT MAX(TOTAL_REVENUE) FROM REVENUE0)
    ORDER BY S_SUPPKEY
    DROP VIEW REVENUE0

    /* TPC_H Query 16 - Parts/Supplier Relationship */
    SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT
    FROM PARTSUPP, PART
    WHERE P_PARTKEY = PS_PARTKEY AND P_BRAND <> 'Brand#45' AND P_TYPE NOT LIKE 'MEDIUM POLISHED%%'
    AND P_SIZE IN (49, 14, 23, 45, 19, 3, 36, 9) AND PS_SUPPKEY NOT IN (SELECT S_SUPPKEY FROM SUPPLIER
     WHERE S_COMMENT LIKE '%%Customer%%Complaints%%')
    GROUP BY P_BRAND, P_TYPE, P_SIZE
    ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE

    /* TPC_H Query 17 - Small-Quantity-Order Revenue */
    SELECT SUM(L_EXTENDEDPRICE)/7.0 AS AVG_YEARLY FROM LINEITEM, PART
    WHERE P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#23' AND P_CONTAINER = 'MED BOX'
    AND L_QUANTITY < (SELECT 0.2*AVG(L_QUANTITY) FROM LINEITEM WHERE L_PARTKEY = P_PARTKEY)

    /* TPC_H Query 18 - Large Volume Customer */
    SELECT TOP 100 C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE, SUM(L_QUANTITY)
    FROM CUSTOMER, ORDERS, LINEITEM
    WHERE O_ORDERKEY IN (SELECT L_ORDERKEY FROM LINEITEM GROUP BY L_ORDERKEY HAVING
     SUM(L_QUANTITY) > 300) AND C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY
    GROUP BY C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE
    ORDER BY O_TOTALPRICE DESC, O_ORDERDATE

    /* TPC_H Query 19 - Discounted Revenue */
    SELECT SUM(L_EXTENDEDPRICE* (1 - L_DISCOUNT)) AS REVENUE
    FROM LINEITEM, PART
    WHERE (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#12' AND P_CONTAINER IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') AND L_QUANTITY >= 1 AND L_QUANTITY <= 1 + 10 AND P_SIZE BETWEEN 1 AND 5
    AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
    OR (P_PARTKEY = L_PARTKEY AND P_BRAND ='Brand#23' AND P_CONTAINER IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') AND L_QUANTITY >=10 AND L_QUANTITY <=10 + 10 AND P_SIZE BETWEEN 1 AND 10 
    AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON') 
    OR (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#34' AND P_CONTAINER IN ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') AND L_QUANTITY >=20 AND L_QUANTITY <= 20 + 10 AND P_SIZE BETWEEN 1 AND 15
    AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')

    /* TPC_H Query 20 - Potential Part Promotion */
    SELECT S_NAME, S_ADDRESS FROM SUPPLIER, NATION
    WHERE S_SUPPKEY IN (SELECT PS_SUPPKEY FROM PARTSUPP
     WHERE PS_PARTKEY in (SELECT P_PARTKEY FROM PART WHERE P_NAME like 'forest%%') AND
     PS_AVAILQTY > (SELECT 0.5*sum(L_QUANTITY) FROM LINEITEM WHERE L_PARTKEY = PS_PARTKEY AND
      L_SUPPKEY = PS_SUPPKEY AND L_SHIPDATE >= '1994-01-01' AND
      L_SHIPDATE < dateadd(yy,1,'1994-01-01'))) AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'CANADA'
    ORDER BY S_NAME

    /* TPC_H Query 21 - Suppliers Who Kept Orders Waiting */
    SELECT TOP 100 S_NAME, COUNT(*) AS NUMWAIT
    FROM SUPPLIER, LINEITEM L1, ORDERS, NATION WHERE S_SUPPKEY = L1.L_SUPPKEY AND
    O_ORDERKEY = L1.L_ORDERKEY AND O_ORDERSTATUS = 'F' AND L1.L_RECEIPTDATE> L1.L_COMMITDATE
    AND EXISTS (SELECT * FROM LINEITEM L2 WHERE L2.L_ORDERKEY = L1.L_ORDERKEY
     AND L2.L_SUPPKEY <> L1.L_SUPPKEY) AND
    NOT EXISTS (SELECT * FROM LINEITEM L3 WHERE L3.L_ORDERKEY = L1.L_ORDERKEY AND
     L3.L_SUPPKEY <> L1.L_SUPPKEY AND L3.L_RECEIPTDATE > L3.L_COMMITDATE) AND
    S_NATIONKEY = N_NATIONKEY AND N_NAME = 'SAUDI ARABIA'
    GROUP BY S_NAME
    ORDER BY NUMWAIT DESC, S_NAME

    /* TPC_H Query 22 - Global Sales Opportunity */
    SELECT CNTRYCODE, COUNT(*) AS NUMCUST, SUM(C_ACCTBAL) AS TOTACCTBAL
    FROM (SELECT SUBSTRING(C_PHONE,1,2) AS CNTRYCODE, C_ACCTBAL
     FROM CUSTOMER WHERE SUBSTRING(C_PHONE,1,2) IN ('13', '31', '23', '29', '30', '18', '17') AND
     C_ACCTBAL > (SELECT AVG(C_ACCTBAL) FROM CUSTOMER WHERE C_ACCTBAL > 0.00 AND
      SUBSTRING(C_PHONE,1,2) IN ('13', '31', '23', '29', '30', '18', '17')) AND
     NOT EXISTS ( SELECT * FROM ORDERS WHERE O_CUSTKEY = C_CUSTKEY)) AS CUSTSALE
    GROUP BY CNTRYCODE
    ORDER BY CNTRYCODE