Distributed data management is a mandatory requirement for
geographically distributed organizations with needs of locality of reference,
high availability/fault-tolerance, scalability, and most important performance.
Data distribution, allocation, and replication are core concepts to implement
any distributed database. In this tutorial, we will be demonstrating to build a
distributed database in a lab environment using SQL Server.
Before we start working on SQL Server, we should make sure
that SQL Server, SQL Server Agent, and SQL Server Browser services are running.
We should also ensure that SQL Server Agent service is set to start
automatically, because these services will operate in background during the
replication of data. Please check figure below:
First we will create a sample database and table for use with
our tutorial. Create a database with any suitable name, e.g., DDB in our case.
Now create a sample table using following script:
CREATE TABLE [dbo].[employee](
[empid] [int] NOT NULL,
[empname] [varchar](50) NOT NULL,
[gender] [varchar](1) NOT NULL,
[empaddress] [varchar](50) NULL,
CONSTRAINT
[PK_employee] PRIMARY KEY
CLUSTERED
(
[empid] ASC
)WITH
(PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON) ON
[PRIMARY]
) ON [PRIMARY]
And insert a sample row using folling statement:
INSERT INTO [DDB].[dbo].[employee]
([empid],[empname],[gender],[empaddress])
VALUES
(1,'Name1','M','Karachi')
Now we have a sample table ready to distribute. In this
tutorial, we will be using the replication feature of SQL Server to create a
distributed database. To create replication, right click on local publications
folder in the object explorer as shown in figure below:
Before we continue, there are few concepts that need
elaboration. What is a publication? Publication allows us to create a
publisher. Similar in concept to publisher for newspaper and magazine. SQL
Server node can act as publisher and it do publishes the snapshot of existing
data and/or changes that are made the published articles (database object such
as tables and views). We will create publications using the publication wizard.
This wizard will show you a dialog as shown below:
Simple click next to proceed with the wizard. For the
publication process, there should be a distributor responsible for distributing
the publications. SQL Server requires designation of distributor, either it is
the same database instance or it might be a different one. For this tutorial,
we will keep it simple. The database instance where publication is to be
published will also function as distributor for itself.
Next step will ask you to specify the snapshot folder. In SQL
Server, each node that wants to get the replicated data need to subscribe to a
publication. Now subscription can be of two types, i.e., push or pull. For push
subscription, the publication node will always push the changes to its
subscriber, which means that publication node maintains the information of its
subscriber and agent publishing the changes will be executing at the
publisher-end. For pull subscription, the subscriber will pull the changes from
publisher, which means that the agent at the subscribe will be pulling the
changes continuously or periodically. To support both push and pull
subscription, the snapshot folder should be a network path with write access.
The snapshot folder is used to store the snapshots of data. Snapshot is the
copy of the databases at a particular instance in time. Figure below shows how
wizard dialog will appear:
Next step will ask to select the publication database. We
will select the database the we create for this tutorial, i.e., DDB. Next step
will ask to select the publication type. I also call it replication type. The
four types are their descriptions are available on this window. For simplicity.
Snapshot replication send snapshots of published data, i.e., all published data
will be replicated again and again on scheduled interval. Transactional
replication initially sends the snapshot of published data and once the
subscribers are initialized, it only sends/streams the changes. However, it is
only a one way update. In third publication type, we have updatable
subscription, i.e., changes made at subscriber will be applied to publisher.
The fourth and last publication type, both publisher and subscriber can update the published data independently. Changes can be merged on both end
periodically or on demand. At first step, we take the Transactional publication
with updatable subscription as the publication type.
A point to consider here is that it is mandatory to have a
primary key in tables to get them published using the transactional replication
type. One can only select the tables with primary key for publication as shown
in figure below.
Once we have selected the articles to publish, move next.
The dialog will prompt about the article issues. Do not worry. It is a
requirement rather then an issue. Updateable subscription require a unique
identifier column. Simple move forward. Next screen will ask if we want to
filter any rows. This may be used to filter data or create fragment. We will
discuss this feature in some other tutorial, right now we move next. Snapshot agent
options will appear. Snapshot agent is responsible for initializing the
subscribers with the publication data. We can either configure it to execute immediately
or we can configure it to execute at a specified schedule. Select the immediate
option and move next as shown in figure below.
Next dialog will ask for agent security setting. This might
vary according to your security settings, but for simplicity click security
setting buttons for both Snapshot agent and Queue reader agent, and select Run
under SQL Server Agent account as shown in figure below.
Next step will ask you to create replication and either you
want to create script file or not. Move forward and provide the publication
name, e.g., DDSPub in our case. Click finish to complete the wizard. This will
start the process of publication creation as shown in figure below.
A publication will appear in Local Publications folder of
object explorer. Once we are done with the publication creation, we will move
forward to create the subscription to get the publication data and changes. For
the purpose of tutorial, we can also use the same DBMS instance to create
subscriptions that will subscribe to itself. However, in real projects,
subscription will be on different DBMS instances across remote nodes.
To create subscription, right click on the local subscription folder in the
object explorer windows and click the New Subscriptions options as shown in
figure below.
After the initial screen, the wizard will ask for selecting
the publisher and the databases and publications that we want to subscribe.
Select the DDSPub that we have created and move next. Then wizard asks for
distribution agent location, which will be at distributor for push subscription
and at subscriber for pull subscription. We have already discussed the
difference between push and pull subscription in this tutorial earlier. Select
the option of push subscription and move forward as shown in figure below.
Next screen will show the option of subscribers and
subscribers databases. One can select the existing databases or can also create
a new one. We will be creating a new one with name DDBRep as shown in figure
below.
Next screen will ask for distribution agent security as shown
below.
For distribution agent, simple select the Run under the SQL
Server Agent service account option and move forward. Next dialog will ask for
synchronization schedule, keep it to run continuously and move forward. Next
option will be about the updatable subscription. One can ask to simultaneously
commit changes or can ask to queue changes to get them committed on suitable
time. Keep it to simultaneously commit changes and move forward. Next screen
will ask for login for updatable subscription. Provide the login of publisher
with sufficient rights. For this tutorial, we will provide sa user credentials
as shown below.
Next screen will ask, when to initialize subscription. Keep
it to immediately. Move forward will give you an option either you want to
create a script. Move next and finish the wizard. As you finish the wizard,
create subscription process will execute. After successful completion of
process, a subscription will appear in Local Subscriptions folder and the newly
created database from subscription will also appear as shown in figure below.
Now if you view the DDPRep database, it will have an employee
table with one record that we have created at the publisher. This shows that
initial snapshot of the publisher database (i.e., DDB) has been created and the
subscriber database (i.e., DDBRep) has been initialized with it. Furthermore,
one can also observe the employee table is altered to contain a
uniqueidentifier column. Now insert a new row at the publisher and check if it
get replicated to subscriber or not. We inserted a new record using the
following statement.
INSERT INTO [DDB].[dbo].[employee]
([empid],[empname],[gender],[empaddress])
VALUES
(2,'Name2','M','Karachi')
Similarly, insert a new record at the subscription and
observe it if gets replicated at the publisher. We inserted a new record using a
statement below at the subscriber database, i.e., DDBReplica.
INSERT INTO [DDBRep].[dbo].[employee]
([empid],[empname],[gender],[empaddress])
VALUES
(3,'Name3','F','Karachi')
If you have both of your record replicated, then it is right
time to also try other publication types available in SQL Server.