SQL Server Data Engineering: April 2015

Thursday, 30 April 2015

SQL Daily

Check out my buddy Andy / ACW. Great DBA, even greater guy!

http://sqldaily.blogspot.co.uk/

Sunday, 26 April 2015

Overview

Replication is a means for providing a copy of database objects and data onto a secondary server (or servers), with various different synchronisation options. There is a Publisher server (i.e. the main copy of the data), and a Subscriber server (or servers) that receive a copy of the data from the Publisher. This happens via a Distributor service and database (for Snapshot and Transactional replication) that is either on the Publisher (for Push Replication), or on the Subscriber (for Pull Replication). Replication can be used as a High Availability option, but other high availability options might be better if your one objective is HA. Replication can be used to integrate with data from other non Microsoft SQL Server sources / destinations - SQL can subscribe to an Oracle Publication for Transactional Replication, and can publish to both IBM DB2 / Oracle databases, as Transactional and Snapshot replication.

There are four flavours of replication in SQL Server 2012 & 2014:

Transactional Replication:
Changes are made on the Subscriber(s) at the same time as data changes on the Publisher. This is because the update to the Subscriber(s) happens within the same transaction as the change to the Publisher. Subscribers must be treated as Read Only, as changes are not pushed back to the Publisher (unless you're using Peer to Peer replication). Requires a primary key on replicated tables.
Good for data that changes frequently - high volume of inserts / updates / deletes, and for situations where Subscriber copies must remain consistent with Publisher at all times. Publisher doesn't need to be a SQL Server (can subscribe to Oracle publication)

Peer to Peer Transactional Replication:

Allows for Load Balancing among nodes. Allows for transactions at subscribers to be synced across other nodes in the replication topology. Conflicts must be prevented by partitioning data so same row cannot be updated at separate locations (as opposed to Merge Replication where a conflict detection agent is used).
Good for situations where data is partitioned in a way that would match the replication topology, and where conflicts are likely to be rare.

Merge Replication:
One or more updateable Subscribers. Changes at the Publisher and Subscriber(s) are tracked by the use of triggers, and applied to the other copies of data. Will create a unique key in replicated table if one doesn't already exist (if one does exist, it must be replicated). Updates can be applied in a schedule (e.g. every 90 mins)
Good for having offline copies of data, e.g. sales staff in the field that need to add / update data on a local copy and then sync back to a head office.

Snapshot Replication:
Data is only updated in bulk via snapshot of entire replicated data set. No incremental changes are made. Can replicate non unique data (i.e. no prerequisite for primary keys in tables to be replicated). Uses a shared folder (ensure replication agents have adequete permissions to this folder). Uses BCP to generate snapshot
Good for data that changes infrequently, often as a bulk process. Good for systems where data does not need to always be completely up to date (i.e. month end reporting). Good when only a small volume of data needs to be replicated.

Pros

Transactional Replication:
Data at the Subscriber(s) is always the same as on the Publisher - all changes on the Publisher happen on the Subscriber(s) within the same transaction.
Low latency between Publisher and Subscriber - changes happen (more or less) at the same time.

Peer to Peer Transactional Replication:
Allows Load Balancing among nodes

Merge Replication:
Data amendments are pushed both to and from the Publisher, so many copies of the data can be updated simultaneously.

Snapshot Replication:
Very low overhead compared to the other replication types, as changes aren't continually being replicated.

All types of Replication:
Good for data synchronisation across geographically distributed data centres.

Cons

Transactional Replication:
Can decrease performance of transactions at the Publisher, due to committing these transactions at the Subscriber(s) at the same time.
Doesn't support updates at Subscriber(s) unless updating Subscribers is configured.

Peer to Peer Replication:
Enterprise only, across all nodes in Replication topology. Conflicts break replication, and partitioning of data must be used to prevent conflicts.

Merge Replication:
Conflicts can occur where subscribers change the same data at the same time
Merge replication does not support indexed views as tables

Snapshot Replication:
No incremental updates - all data must be copied over each time

All types of Replication:
Requires a lot of bandwidth if there are a lot of changes to be replicated.
Replication can be difficult to troubleshoot

Replication Definitions:
Publisher - An instance that makes data available via a Publication
Article - A published object (i.e. a table, SP, view, etc)
Publication - A collection of Articles
Distributor - Instance that manages transmission from Publisher to Subscriber. Distributor is often also the Publisher. If this is not the case, then it can be considered a Remote Distributor
Subscriber - An instance that receives a Publication
Agent - Service that enables the Publisher, Distributor, Subscriber etc to perform replication based tasks

Replication Monitor - a tool available within SSMS to monitor replication. Can configure alerts based on performance benchmarks

Replication resources

Very useful Codeproject page on replication: http://www.codeproject.com/Articles/715550/SQL-Server-Replication-Step-by-Step
MSDN page for Replication: https://msdn.microsoft.com/en-us/library/ms152531(v=sql.120).aspx
Transactional Replication: https://msdn.microsoft.com/en-us/library/ms151176(v=sql.110).aspx
Merge Replication: https://msdn.microsoft.com/en-us/library/ms152746.aspx
Snapshot Replication: https://msdn.microsoft.com/en-us/library/ms151832.aspx
SQL Server Central Stairway to Replication: http://www.sqlservercentral.com/stairway/72401/

Saturday, 25 April 2015

DBA - The Happiest Job in the World! According to Forbes

According to Forbes, Database Administrator has topped the list of happiest jobs in America, based on a survey they took in 2014. Employees were asked to rate their jobs in a number of different categories, including growth opportunities, compensation, work life balance etc. No wonder Pinal Dave is always smiling! Full article here:

http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/

High Availability - Misc Resources

Some great resources about High Availability in SQL Server:

Overview of High Availability options:
http://www.toadworld.com/platforms/sql-server/w/wiki/9739.alternatives-to-replication.aspx

Pros and Cons of High Availability Options
http://serverfault.com/questions/99228/high-availability-on-sql-server-cons-and-pros-of-solutions

MSDN Overview of High Availability options in SQL Server 2014:
https://msdn.microsoft.com/en-us/library/ms190202.aspx

High Availability - AlwaysOn Failover Clustering

Overview

Failover Clustering (or AlwaysOn Failover Clustering) is a means for providing redundancy at a server level. Should one server fail, another server will take it's place automatically. Clustering requires two (or more) servers, and a shared disk. Standard Edition will only support a 2 node failover cluster.

Pros

Automatic failover from one server to another
Appears as one SQL Server on the network
Redundancy at a server level
Robust
AlwaysOn supports failover cluster nodes in different geographical locations (via virtual LAN)

Cons

Expensive, more involved (tricky) to set up and configure than other High Availability methods, can be sluggish to failover. Not all SQL features are able to be clustered (i.e. Reporting Services). Data is only in ONE place (i.e. on a shared disk) - use RAID / Disk Mirroring to mitigate this risk

Clustering resources

Really great post here (by Kendra Little on Brent Ozar blog): http://www.brentozar.com/archive/2012/02/introduction-sql-server-clusters/
MSDN: https://technet.microsoft.com/en-us/library/ms189134(v=sql.110).aspx

High Availability - Log Shipping

Overview

Log shipping is a means for keeping a warm standby copy of a database on a secondary server. It works by continually taking, copying, and restoring, transaction log backups from the primary server to the secondary server on a schedule. Log Shipping is available in Standard and Enterprise Edition

Pros

Very performant
Can read from the secondary database (between log restores), as long as Log Shipping is configured to keep this secondary database in "Standby" (not possible if Log Shipping is configured to keep secondary database in "Recovering" state)
Can ship logs to multiple secondary servers
Cheap
Easy to set up

Cons

Manual failover is required in the event of a failure on the primary server (to failover manually: apply uncommitted transaction logs, WITH RECOVERY)
Additional log backups not part of the scheduled log shipping backups will break the chain and stop log shipping from functioning
Data cannot be manipulated between source and target system, and it's not possible to only transfer a portion of the data
Per database level - if you want to log ship 5 databases, you have to repeat the log shipping configuration for each database (i.e. set up log shipping 5 times)
There is a lag between the log backup being taken at the primary database, and the logs being applied to the secondary database
Logs are only applied on the schedule you set - if you ship logs every 15 minutes, the secondary database can get up to 15 mins out of sync with the primary server. This can however be useful, if an erroneous data change is made at the primary, you can halt log shipping, and check the data on the secondary

Log Shipping resources

Brent Ozar blog: http://www.brentozar.com/archive/2013/03/log-shipping-faq/
Setting up Log Shipping: https://msdn.microsoft.com/en-us/library/ms190640.aspx

High Availability - Database Mirroring

Overview

Mirroring is a means to provide a secondary copy of your data. The secondary copy is automatically updated when data on the primary server is changed.

Different types of mirroring

Synchronous (high safety / high availability - database SAFETY option set to FULL) – data is always kept exactly the same on the primary and secondary server. Available in Standard & Enterprise edition. Automatic failover available with a witness server

Asynchronous (high performance - database SAFETY option set to OFF) – data is eventually up to date on secondary server

Mirroring can be configured with or without a witness server. Available in Enterprise Edition
Mirroring is per database, and only works for databases using Full recovery model

Pros

Almost instant, automatic failover when using a witness server in synchronous (high safety / high availability) mode

Cons

Mirroring is deprecated in future version of SQL Server. Microsoft advise not to use this for any new SQL Server implementations, and look to remove this function from systems that currently use it (source: https://msdn.microsoft.com/en-us/library/ms190202.aspx)Can cause some performance overhead in Synchronous configurationCan only mirror to one secondary serverMaster database cannot be mirrored - logins have to be added manually to secondary server, then mapped to the database users
Cannot access mirrored data directly (i.e. for reporting) - although you can set up a snapshot of the mirrored database and access this for reporting
Cannot mirror a database containing FILESTREAM filegroups
You can't detach a database that is being mirrored
Can only have ONE mirror copy of the database

Notes

Removing mirroring leaves the secondary copy of the database in a RECOVERING (unusable) state. The database must be recovered to be usable.

Mirroring resources

Brent Ozar blog: http://www.brentozar.com/archive/2013/07/database-mirroring-faq/
Setting up mirroring: https://technet.microsoft.com/en-us/library/ms190941(v=sql.110).aspx
MSDN on Mirroring: https://msdn.microsoft.com/en-us/library/ms189852.aspx