Monday, September 10, 2012

Distributed data management: Replication using SQL Server


Distributed data management is a mandatory requirement for geographically distributed organizations with needs of locality of reference, high availability/fault-tolerance, scalability, and most important performance. Data distribution, allocation, and replication are core concepts to implement any distributed database. In this tutorial, we will be demonstrating to build a distributed database in a lab environment using SQL Server.

Before we start working on SQL Server, we should make sure that SQL Server, SQL Server Agent, and SQL Server Browser services are running. We should also ensure that SQL Server Agent service is set to start automatically, because these services will operate in background during the replication of data. Please check figure below:


First we will create a sample database and table for use with our tutorial. Create a database with any suitable name, e.g., DDB in our case. Now create a sample table using following script:

CREATE TABLE [dbo].[employee](
      [empid] [int] NOT NULL,
      [empname] [varchar](50) NOT NULL,
      [gender] [varchar](1) NOT NULL,
      [empaddress] [varchar](50) NULL,
 CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED
(
      [empid] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
And insert a sample row using folling statement:
INSERT INTO [DDB].[dbo].[employee]
           ([empid],[empname],[gender],[empaddress])
     VALUES
           (1,'Name1','M','Karachi')

Now we have a sample table ready to distribute. In this tutorial, we will be using the replication feature of SQL Server to create a distributed database. To create replication, right click on local publications folder in the object explorer as shown in figure below:


Before we continue, there are few concepts that need elaboration. What is a publication? Publication allows us to create a publisher. Similar in concept to publisher for newspaper and magazine. SQL Server node can act as publisher and it do publishes the snapshot of existing data and/or changes that are made the published articles (database object such as tables and views). We will create publications using the publication wizard. This wizard will show you a dialog as shown below:


Simple click next to proceed with the wizard. For the publication process, there should be a distributor responsible for distributing the publications. SQL Server requires designation of distributor, either it is the same database instance or it might be a different one. For this tutorial, we will keep it simple. The database instance where publication is to be published will also function as distributor for itself.


Next step will ask you to specify the snapshot folder. In SQL Server, each node that wants to get the replicated data need to subscribe to a publication. Now subscription can be of two types, i.e., push or pull. For push subscription, the publication node will always push the changes to its subscriber, which means that publication node maintains the information of its subscriber and agent publishing the changes will be executing at the publisher-end. For pull subscription, the subscriber will pull the changes from publisher, which means that the agent at the subscribe will be pulling the changes continuously or periodically. To support both push and pull subscription, the snapshot folder should be a network path with write access. The snapshot folder is used to store the snapshots of data. Snapshot is the copy of the databases at a particular instance in time. Figure below shows how wizard dialog will appear:


Next step will ask to select the publication database. We will select the database the we create for this tutorial, i.e., DDB. Next step will ask to select the publication type. I also call it replication type. The four types are their descriptions are available on this window. For simplicity. Snapshot replication send snapshots of published data, i.e., all published data will be replicated again and again on scheduled interval. Transactional replication initially sends the snapshot of published data and once the subscribers are initialized, it only sends/streams the changes. However, it is only a one way update. In third publication type, we have updatable subscription, i.e., changes made at subscriber will be applied to publisher. The fourth and last publication type, both publisher and subscriber can update the published data independently. Changes can be merged on both end periodically or on demand. At first step, we take the Transactional publication with updatable subscription as the publication type.


A point to consider here is that it is mandatory to have a primary key in tables to get them published using the transactional replication type. One can only select the tables with primary key for publication as shown in figure below.


Once we have selected the articles to publish, move next. The dialog will prompt about the article issues. Do not worry. It is a requirement rather then an issue. Updateable subscription require a unique identifier column. Simple move forward. Next screen will ask if we want to filter any rows. This may be used to filter data or create fragment. We will discuss this feature in some other tutorial, right now we move next. Snapshot agent options will appear. Snapshot agent is responsible for initializing the subscribers with the publication data. We can either configure it to execute immediately or we can configure it to execute at a specified schedule. Select the immediate option and move next as shown in figure below.


Next dialog will ask for agent security setting. This might vary according to your security settings, but for simplicity click security setting buttons for both Snapshot agent and Queue reader agent, and select Run under SQL Server Agent account as shown in figure below.



Next step will ask you to create replication and either you want to create script file or not. Move forward and provide the publication name, e.g., DDSPub in our case. Click finish to complete the wizard. This will start the process of publication creation as shown in figure below.


A publication will appear in Local Publications folder of object explorer. Once we are done with the publication creation, we will move forward to create the subscription to get the publication data and changes. For the purpose of tutorial, we can also use the same DBMS instance to create subscriptions that will subscribe to itself. However, in real projects, subscription will be on different DBMS instances across remote nodes. To create subscription, right click on the local subscription folder in the object explorer windows and click the New Subscriptions options as shown in figure below.


After the initial screen, the wizard will ask for selecting the publisher and the databases and publications that we want to subscribe. Select the DDSPub that we have created and move next. Then wizard asks for distribution agent location, which will be at distributor for push subscription and at subscriber for pull subscription. We have already discussed the difference between push and pull subscription in this tutorial earlier. Select the option of push subscription and move forward as shown in figure below.


Next screen will show the option of subscribers and subscribers databases. One can select the existing databases or can also create a new one. We will be creating a new one with name DDBRep as shown in figure below.


Next screen will ask for distribution agent security as shown below.


For distribution agent, simple select the Run under the SQL Server Agent service account option and move forward. Next dialog will ask for synchronization schedule, keep it to run continuously and move forward. Next option will be about the updatable subscription. One can ask to simultaneously commit changes or can ask to queue changes to get them committed on suitable time. Keep it to simultaneously commit changes and move forward. Next screen will ask for login for updatable subscription. Provide the login of publisher with sufficient rights. For this tutorial, we will provide sa user credentials as shown below.


Next screen will ask, when to initialize subscription. Keep it to immediately. Move forward will give you an option either you want to create a script. Move next and finish the wizard. As you finish the wizard, create subscription process will execute. After successful completion of process, a subscription will appear in Local Subscriptions folder and the newly created database from subscription will also appear as shown in figure below.


Now if you view the DDPRep database, it will have an employee table with one record that we have created at the publisher. This shows that initial snapshot of the publisher database (i.e., DDB) has been created and the subscriber database (i.e., DDBRep) has been initialized with it. Furthermore, one can also observe the employee table is altered to contain a uniqueidentifier column. Now insert a new row at the publisher and check if it get replicated to subscriber or not. We inserted a new record using the following statement.

INSERT INTO [DDB].[dbo].[employee]
           ([empid],[empname],[gender],[empaddress])
     VALUES
           (2,'Name2','M','Karachi')

Similarly, insert a new record at the subscription and observe it if gets replicated at the publisher. We inserted a new record using a statement below at the subscriber database, i.e., DDBReplica.

INSERT INTO [DDBRep].[dbo].[employee]
           ([empid],[empname],[gender],[empaddress])
     VALUES
           (3,'Name3','F','Karachi')

If you have both of your record replicated, then it is right time to also try other publication types available in SQL Server.