Thursday, February 21, 2013

Building Data Cubes: An Introduction to SQL Server Analysis Services

SQL Server Analysis Services is a good option for experts working on data warehousing and business intelligence technologies, specifically OLAP and Data Mining. SQL Server Analysis Services provide you features to design, create, and manage multi-dimensional data cubes using any of your favorite data sources. Data Mining feature is also available, and it is also very good, but my focus in this blog will be OLAP using SQL Server Analysis Services. I prefer RapidMiner for my data mining tasks :-). To start working with SQL Server Analysis Services, it must be installed in your computer along with client components. This tutorial uses SQL Server 2008. The most important component for you will be SQL Server Business Intelligence Development Studio and SQL Server Analysis Services. You can find SQL Server Business Intelligence Development Studio in your programs menu as show in figure below:


After starting the development studio, go to File -> New -> Project menu. You will find many different project types as show in figure below. For OLAP purpose, Analysis Services Project is of interest for you. Please select the appropriate folder and name for your project.



Once you are done with project selection, you will end up at a screen similar to the one show in the figure below:


On the right side of the studio, you can view a Solution Explorer. In Solution Explorer, there is a project name MyOLAP, which we have just created. The folders under the MyOLAP project are important. Each of them have significant meaning and purpose. For our OLAP task, we will start with Data Sources. Data Source folder contains the list of Data Sources that we add to our project and we intend to use for analysis. Here you can add any data source for which you have driver available in your computer for database connectivity. Just right click on the Data Sources folder and click New Data Source. This bring you the Data Source Wizard.


As you move next, you can view the list of existing Data Connections. You can use existing ones, or  can create a new one using the new button.


To create a new database connection, you have to specify the provider that you want to use for database connectivity. For example, in the figure below, I am using SQL Server Native Client to connect to local SQL Server on my computer. Provide the server name, user id, and password. If you have provided correct connection information, then you can view the list of existing databases  in the combo of "Select or enter a database name". You can select the appropriate database or write its name.


Once you are done with your database connection creation, your newly created connection will be in the list of Data Connections as shown in the figure below.


As you move next, wizard will ask you for Impersonation Information. This information is used to define the Windows credentials that Analysis Services will use to connect to the data source. More details can be found at Set Impersonation Options (SSAS - Multidimensional). I have used the same credentials as of current user. However, you can also use a specific account for your project.


At the end, wizard ask you to specify a data source name to finish.


As you finish the wizard, you can see the Data Source beneath Data Sources folder.


The second step will be to create Data Source Views. Data Source Views allow us to select database object of our interest from the Data Source. A Data Source may contain many tables or views. Using Data Source Views we select the objects that we intend to use in our project. Right click on Data Source Views and click New Data Source View. This will bring you the Data Source View Wizard.


As you move next, you can view the data source that you have created in previous step.


On the next screen, you can view the objects available at your selected data source. This view allows you to select the objects of your interest. For example, I have selected the FactInternetSales table from AdventureWorksDW database. This screen provides a very useful feature to auto-select the related tables. Simply click on Add Related Tables. Wizard will automatically add the related tables with FactInternetSales table. During definition of dimensions and cube in our project, only these tables will be visible/available to us.


On the next screen, provide a title for the view and finish the wizard.


This will bring a data source view beneath Data Source View folder. IDE also shows a diagram of tables with their inter-connectivity as shown in figure below.


Once you are done with creating the data source view, the next step will be to create dimensions for your cube. To create a dimension, right click on the Dimensions folder and click new Dimensions.


As you move next, wizard will ask you about the dimension creation method. As we are using the AdventureWorksDW database, we will be using the use an existing table option. For more information on other options, please refer to Select Creation Method (Dimension Wizard).


Now we have to select the main table for the dimension. For example, in the figure below, I have selected Product as dimension. A key column is mandatory as this key column will be used to connect/link dimension with the fact table.


On the next screen, you can select the attributes that you want to use in your dimension. For example, to display or to generate concept hierarchies. In the figure below, I have selected the English Product Name, which I will use to display this dimension during my OLAP analysis.



Once you are done, please specify a valid name for your dimension to finish.


All the dimensions that you create are available underneath the Dimensions folder in the solution explorer.


Please create all dimensions that are required for your Cube. For example, in this tutorial, we have created three dimensions, i..e, Product, Region, and Time. To create a data cube, right click on Cubes folder and click New Cube. This will bring you the cube creation wizard.


As you move next, this wizard will also ask you the cube creation method as was the case with dimensions creation. We will select the use existing tables option. For other creation methods, please refere to Select Creation Method (Cube Wizard).


As we move next, we can view the tables that we selected in our data source view. We will use the FactInternetSales as our fact table.


On next screen, wizard ask us to select the measure for our Cube. We have selected only two, but according to your requirements, you can select any of them.


On the next screen, wizard will ask you to select existing dimensions that you want to use in your cube. We have created three dimensions and we will be using all three of them.


On the next screen, wizard also allows us to create new dimensions, if required. We will skip this screen because we have already created dimensions of our interest.


On the next screen name the cube and finish the wizard.


After finishing the wizard, the IDE will display you the Data Source View of the cube in the middle of the screen as well as measures and dimensions on the left side of the window.


All you need to do is to process your cube. Just click the process icon, second from left on the tool bar or you can use Cube -> Process menu option. This will perform the necessary processing of cube using a wizard.


During the processing, you can observe the progress. As long as there is not red color, everything is fine.


Once cube is processed. You can move to browser tab to browse your cube data. You will only need to drag and drop the dimensions and measure in the white area in the middle.


As you drop each field, you will get preview of your data.Once all field are dropped then you can analyze the Internet Sales using multi-dimensional analysis as show in figure below.