SQL Server Data Engineering: Replication

Overview

Replication is a means for providing a copy of database objects and data onto a secondary server (or servers), with various different synchronisation options. There is a Publisher server (i.e. the main copy of the data), and a Subscriber server (or servers) that receive a copy of the data from the Publisher. This happens via a Distributor service and database (for Snapshot and Transactional replication) that is either on the Publisher (for Push Replication), or on the Subscriber (for Pull Replication). Replication can be used as a High Availability option, but other high availability options might be better if your one objective is HA. Replication can be used to integrate with data from other non Microsoft SQL Server sources / destinations - SQL can subscribe to an Oracle Publication for Transactional Replication, and can publish to both IBM DB2 / Oracle databases, as Transactional and Snapshot replication.

There are four flavours of replication in SQL Server 2012 & 2014:

Transactional Replication:
Changes are made on the Subscriber(s) at the same time as data changes on the Publisher. This is because the update to the Subscriber(s) happens within the same transaction as the change to the Publisher. Subscribers must be treated as Read Only, as changes are not pushed back to the Publisher (unless you're using Peer to Peer replication). Requires a primary key on replicated tables.
Good for data that changes frequently - high volume of inserts / updates / deletes, and for situations where Subscriber copies must remain consistent with Publisher at all times. Publisher doesn't need to be a SQL Server (can subscribe to Oracle publication)

Peer to Peer Transactional Replication:

Allows for Load Balancing among nodes. Allows for transactions at subscribers to be synced across other nodes in the replication topology. Conflicts must be prevented by partitioning data so same row cannot be updated at separate locations (as opposed to Merge Replication where a conflict detection agent is used).
Good for situations where data is partitioned in a way that would match the replication topology, and where conflicts are likely to be rare.

Merge Replication:
One or more updateable Subscribers. Changes at the Publisher and Subscriber(s) are tracked by the use of triggers, and applied to the other copies of data. Will create a unique key in replicated table if one doesn't already exist (if one does exist, it must be replicated). Updates can be applied in a schedule (e.g. every 90 mins)
Good for having offline copies of data, e.g. sales staff in the field that need to add / update data on a local copy and then sync back to a head office.

Snapshot Replication:
Data is only updated in bulk via snapshot of entire replicated data set. No incremental changes are made. Can replicate non unique data (i.e. no prerequisite for primary keys in tables to be replicated). Uses a shared folder (ensure replication agents have adequete permissions to this folder). Uses BCP to generate snapshot
Good for data that changes infrequently, often as a bulk process. Good for systems where data does not need to always be completely up to date (i.e. month end reporting). Good when only a small volume of data needs to be replicated.

Pros

Transactional Replication:
Data at the Subscriber(s) is always the same as on the Publisher - all changes on the Publisher happen on the Subscriber(s) within the same transaction.
Low latency between Publisher and Subscriber - changes happen (more or less) at the same time.

Peer to Peer Transactional Replication:
Allows Load Balancing among nodes

Merge Replication:
Data amendments are pushed both to and from the Publisher, so many copies of the data can be updated simultaneously.

Snapshot Replication:
Very low overhead compared to the other replication types, as changes aren't continually being replicated.

All types of Replication:
Good for data synchronisation across geographically distributed data centres.

Cons

Transactional Replication:
Can decrease performance of transactions at the Publisher, due to committing these transactions at the Subscriber(s) at the same time.
Doesn't support updates at Subscriber(s) unless updating Subscribers is configured.

Peer to Peer Replication:
Enterprise only, across all nodes in Replication topology. Conflicts break replication, and partitioning of data must be used to prevent conflicts.

Merge Replication:
Conflicts can occur where subscribers change the same data at the same time
Merge replication does not support indexed views as tables

Snapshot Replication:
No incremental updates - all data must be copied over each time

All types of Replication:
Requires a lot of bandwidth if there are a lot of changes to be replicated.
Replication can be difficult to troubleshoot

Replication Definitions:
Publisher - An instance that makes data available via a Publication
Article - A published object (i.e. a table, SP, view, etc)
Publication - A collection of Articles
Distributor - Instance that manages transmission from Publisher to Subscriber. Distributor is often also the Publisher. If this is not the case, then it can be considered a Remote Distributor
Subscriber - An instance that receives a Publication
Agent - Service that enables the Publisher, Distributor, Subscriber etc to perform replication based tasks

Replication Monitor - a tool available within SSMS to monitor replication. Can configure alerts based on performance benchmarks

Replication resources

Very useful Codeproject page on replication: http://www.codeproject.com/Articles/715550/SQL-Server-Replication-Step-by-Step
MSDN page for Replication: https://msdn.microsoft.com/en-us/library/ms152531(v=sql.120).aspx
Transactional Replication: https://msdn.microsoft.com/en-us/library/ms151176(v=sql.110).aspx
Merge Replication: https://msdn.microsoft.com/en-us/library/ms152746.aspx
Snapshot Replication: https://msdn.microsoft.com/en-us/library/ms151832.aspx
SQL Server Central Stairway to Replication: http://www.sqlservercentral.com/stairway/72401/

SQL Server Data Engineering

Sunday, 26 April 2015

Replication

Overview

Pros

Cons

Replication resources

No comments:

Post a Comment

Updating massive amount of rows whilst avoiding blocking