Saturday, January 26, 2013

ETL using MS SQL Server 2008: An Introduction

MS SQL Server 2008 provides rich features for ETL in its integration services tools. To use integration services of MS SQL Server 2008, they must be selected during installation process. Either you have integration services installed on not can be identified by viewing the windows services.


If you have SQL Server Integration Services running this means you have this feature installed on your PC. Now to work with SQL Server Integration Services projects, you must have installed SQL Server Business Intelligence Development Studio. You can install is through your installation setup by selecting this feature in the list of features to be installed. Open SQL Server Business Intelligence Development Studio, go to File -> New -> Project. Select Integration Services Project from the list.


Provide the path and suitable name for saving the project on your local directory. You will end-up with a screen containing the solution explorer on right with the project that you have created with and empty Package.dtsx in the SSIS Packages folder. In the center, Package.dtsx will be open in design mode. By default user is at the Control Flow tab.On the left, you can see an toolbox with Control Flow Items.


Now the first task that we need to do is to define the data sources and data views. Data sources are connection to data sources that we want to use during ETL. Data view as name suggest is a view on data sources. Using views we can selected the database objects that we want to use during ETL. One can easily create data sources and data source views using wizards. Just right click on the data source and data source views folders in the solution explorer and click new.

Once we are done with selecting the required data sources, we should move forward with our ETL activities. For example, we take a simple migration activity. This will require a control flow item of  Data Flow Task. Simple drag and drop the Data Flow Task on the Control Flow tab window.


Now to configure and use the Data Flow Task, simple double click on Data Flow Task item. This will take you to Data Flow tab, where you can perform the data flow configuration. For ETL, there are three importance activities that are usually performed, that is Extract, Transform, and Load. When you mode to Data Flow tab, you can see toolbox items classified into three groups, i.e., Data Flow Sources, Data Flow Transformations, and Data Flow Destinations. For our initial migration activity, we will simple migrate the data from source to destination. For this purpose, select a source from Data Flow Sources, e.g., ADO NET Source. Similarly, select a destination from Data Flow Destinations, e.g., ADO NET Destinations. Each component has two directional arrows links associated with them, one is green and another one is red. Please connect the green link from source to destination as show in Figure below:


Now double click on each of source and destination components to configure them. One can select the source object and and preview its result. Similarly, column information for the source can also be seen using the Columns link the left.


Once source is configured, we can move forward to configure the destination in the same way except a slight difference that at destination we can also create new object for migrated data. Similarly, we have to define he mapping from source columns to destination column using the Mapping link on the left. IDE automatically maps source column to destination according to column name similarity, however, it also give provision to alter them as needed.



Once we are done with source and destination configuration and mapping. We are ready to execute the package. To execute the process just right click on the Package.dtsx window and click Execute Task. If task execution is successful then all components will turn green. In case of any failure, the component will turn red, which means we have to do more work.


Now if for example, we also want to perform transformations during the data migration. We will need to use tools/components from Data Flow Transformations. For example, take very important activity of ETL, i.e., data conversion. For this purpose, we can use Data Conversion component. Simple drag and drop the component in the data flow tab window. Connect the output of Data Source to Data Conversion and the output of Data Conversion to Data Destination.


For any conversion definition, just double click on Data Conversion component and set the conversions using the Data Conversion Transformation Editor.


What ever conversion we define. Each conversion result in a new column. This can be observed using the mapping link of data destination component.


We can simple change the mapping to migrate the converted column to destination. Make sure the the destination column is of the same type as you have defined during Data Conversion definition.


Now execute the task again. This package will transform the data type during the migration.