SQL Server Data Engineering: 2015

Wednesday, 18 November 2015

Reset Forgotten SA Password

The following article explains how to grant SYSADMIN rights to a user, when you may not have ANY SYSADMIN access at all (e.g. forgetting the SA password, and not having any other sysadmin accounts):

https://technet.microsoft.com/en-us/magazine/jj853293.aspx

Monday, 3 August 2015

Function to Proper Case a text string

The following function will capitalise the first letter of each word:

create function ProperCase(@Text as varchar(8000))

returns varchar(8000)

as

begin

   declare @Reset bit;

   declare @Ret varchar(8000);

   declare @i int;

   declare @c char(1);

   select @Reset = 1, @i=1, @Ret = '';

   while (@i <= len(@Text))

    select @c= substring(@Text,@i,1),

               @Ret = @Ret + case when @Reset=1 then UPPER(@c) else LOWER(@c) end,

               @Reset = case when @c like '[a-zA-Z]' then 0 else 1 end,

               @i = @i +1

   return @Ret

end

Credit to the Stack Overflow post here: http://stackoverflow.com/questions/230138/sql-server-make-all-upper-case-to-proper-case-title-case

Tuesday, 7 July 2015

Create database from just an MDF file

In a situation where you may have no backup of a database, but you have the data file (the .mdf) but no log file (.ldf), the following SQL script can be used to create a database using just the .mdf data file:

CREATE DATABASE DatabaseName

ON (FILENAME = '<path to your file here>\DatabaseName_Data.mdf')

FOR ATTACH_REBUILD_LOG;

Sometimes additional steps are required, as documented here: https://www.mssqltips.com/sqlservertip/3579/how-to-attach-a-sql-server-database-without-a-transaction-log-and-with-open-transactions/

Friday, 26 June 2015

Get Backup History

Use the follwing SQL:

USE
DatabaseName

GO

-- Get Backup History for required database

SELECT TOP 100

s.database_name,

m.physical_device_name,

CAST(CAST(s.backup_size / 1000000 AS INT) AS VARCHAR(14)) + ' ' + 'MB' AS bkSize,

CAST(DATEDIFF(second, s.backup_start_date,

s.backup_finish_date) AS VARCHAR(4)) + ' ' + 'Seconds' TimeTaken,

s.backup_start_date,

CAST(s.first_lsn AS VARCHAR(50)) AS first_lsn,

CAST(s.last_lsn AS VARCHAR(50)) AS last_lsn,

CASE s.[type]

WHEN 'D' THEN 'Full'

WHEN 'I' THEN 'Differential'

WHEN 'L' THEN 'Transaction Log'

END AS BackupType,

s.server_name,

s.recovery_model

FROM msdb.dbo.backupset s

INNER JOIN msdb.dbo.backupmediafamily m ON s.media_set_id = m.media_set_id

WHERE s.database_name = DB_NAME() -- Remove this line for all the database

ORDER BY backup_start_date DESC, backup_finish_date

GO

Taken from Pinal D: http://blog.sqlauthority.com/2010/11/10/sql-server-get-database-backup-history-for-a-single-database/

Wednesday, 27 May 2015

Listing Stored Procedures used by SSRS

The following SQL will allow you to list all stored procedures used by Reporting Services reports. Run this against the ReportServer database on the SQL Server your Reporting Services installation uses:

;with xmlnamespaces

(

default

'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition',

'http://schemas.microsoft.com/SQLServer/reporting/reportdesigner' AS rd

)

select

       name

       ,x.value('CommandType[1]', 'VARCHAR(50)') AS CommandType

       ,x.value('CommandText[1]','VARCHAR(50)') AS CommandText

from (

       select

              name

              , cast(cast(content AS VARBINARY(MAX))as xml) as reportXML

       from

              ReportServer.dbo.Catalog

       where

              Name not like '%.gif'

              and Name not like '%.jpg'

              and Name not like '%.jpeg'

) a

cross apply reportXML.nodes('/Report/DataSets/DataSet/Query') r(x)

where

       x.value('CommandType[1]', 'VARCHAR(50)') = 'StoredProcedure'

Note, the 4th line, it may be necessary to change "2008" to "2003", "2005", "2010" depending on the version(s) of Visual Studio / Report Builder / BIDS, SSDT you've used to create your reports.

Credit to Jacob Sebastian, I used his post as a basis here:
http://beyondrelational.com/modules/2/blogs/28/posts/10446/how-to-find-all-stored-procedures-used-by-report-server.aspx

Wednesday, 6 May 2015

RAID Disk Levels

Google definition: RAID (originally redundant array of inexpensive disks; now commonly redundant array of independent disks) is a data storage virtualsation technology that combines multiple disk drive components into a single logical unit for the purposes of data redundancy or performance improvement.

There are a number of common RAID array configurations:

RAID 0:

This is also known as "striping" or a stripe of disks. Data is distributed (or "striped") across a number of drives, without any copy of the data, or any parity information about the data.

Disks: Minimum 2 disks required. Disks are striped, no mirror, no parity
Good for: performance - it is very quick to read and write to
Bad for: redundancy - if a disk fails, you've lost your entire RAID array. No good for critical systems

RAID 1:

This is also known as "mirroring" or a mirror of disks. The same data is written to two disks. Should one disk fail, you still have a complete copy of the entire disk on the other mirrored disk.

Disks: Minimum 2 disks required. Disks are mirrored, no stripe, no parity
Good for: redundancy - if a disk fails, you still have a copy of the entire disk.
Bad for: useable disk space - 2 x 1TB disk drives configured as RAID 1 only gives you 1TB of useable disk space

RAID 5:

This is the best cost effective option for both performance and redundancy

Disks: Minimum 3 disks required. Disks are striped, with distributed parity
Good for: Read performance due to striped disks. Also good for redundancy - if a disk fails, the data can be recovered based on the parity information
Bad for: write speeds can be slow

RAID 10:

Also known as RAID 1 + 0, or a "stripe of mirrors", it combines both RAID 1 and RAID 0.

Disks: Minimum 4 disks required. Disks are striped, and mirrored
Good for: Performance due to striped disks. Good for redundancy due to mirrored disks. The best option for critical systems (especially databases)
Bad for: cost - the most expensive option

Resources:

PCWorld post on RAID: http://www.pcworld.com/article/194360/raid-made-easy.html
The Geek Stuff: http://www.thegeekstuff.com/2010/08/raid-levels-tutorial/
There are some additional less common RAID levels (2,3,4,6), details about these here: http://www.thegeekstuff.com/2011/11/raid2-raid3-raid4-raid6/

Monday, 4 May 2015

Transparent Data Encryption (TDE)

Transparent Data Encryption (or TDE) is a means for encrypting your entire database, transparently. It was introduced from SQL Server 2008. It is transparent to the applications, as SQL Server does the encryption / decryption of data on the fly as required, there is no additional configuration required for your applications to make use of a database encrypted using TDE. Once TDE is set up, the data will be encrypted on the disk, and any backups of the data will be encrypted also, and you'll be unable to restore the data without the encyption key.

Setting up TDE:
There are a few steps to set up TDE for your SQL Server database(s):

Create a Master Key (on the master database)
Create a Certificate with the Master Key (on the master database, and back this certificate up)
Create Database Encryption Key (on the database you want to use TDE on)
Turn on encryption on the required database
Repeat steps 3 & 4 for any other databases you require TDE for

SQL for these steps (taken from MSDN):

use master;

go

create master key encryption by password = '<UseStrongPasswordHere>';

go

create certificate MyServerCert

with

subject = 'My DEK Certificate';

go

use AdventureWorks2012;

go

create database encryption key

with algorithm = aes_128

encryption by server certificate MyServerCert;

go

alter database AdventureWorks2012 set encryption on;

go

Moving a TDE Encrypted database:

If a database using TDE is required to be restored to another server, the following steps must be followed:

Via Windows Explorer, copy TDE encrypted database, backup of server key and certificate to destination server
Create database master key on destination server
Recreate server certificate on destination server using backup from origin server
Restore / attach TDE encypted database

Resources:

Recover database without database encryption key:
http://dba.stackexchange.com/questions/34753/can-i-recover-a-tde-certificate-by-restoring-the-master-database

MSDN - set up / move TDE encrypted database: https://msdn.microsoft.com/en-gb/library/ff773063.aspx

SQLIO & SQLIOSIM

SQLIO is a lightweight command line application from Microsoft, to test your disk I/O and get some performance benchmarking figures.

Resources:
Brent Ozar blog post about SQLIO: http://www.brentozar.com/archive/2008/09/finding-your-san-bottlenecks-with-sqlio/
Download SQLIO from Microsoft: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=20163

SQLIOSIM is not a performance benchmarking tool, but more of a stress testing tool, due to it's use of random patterns. Unlike SQLIO, SQLIOSIM has a graphical user interface. SQLIOSIM will test what kind of I/O you'll get from a database based on where you'll put your data and log files, auto grow configuration, whether or not you're using sparse files etc

Resources:
Download SQLIOSIM from Microsoft: https://support.microsoft.com/en-us/kb/231619

Sunday, 3 May 2015

Dynamic Management Views (DMVs)

Information about some of the DMVs availabile in SQL Server, and their function

Sessions

sys.dm_exec_sessions
Information about successful and unsuccessful logins

sys.dm_exec_connections
Provides information about connections established to the Database Engine instance

sys.dm_exec_requests
Information about each request executing on the SQL instance

sys.dm_exec_cursors
Information about cursors open

Audits

sys.dm_audit_actions
Information about every audit action that can be reported in the audit log as well as every audit action group that you are able to configure as part of SQL Server Audit

sys.dm_server_audit_status

Information about the current state of auditing

sys.dm_audit_class_type_map

Information about the class_type field in the audit output

fn_get_audit_file

Information from an audit output file that has already been generated by a server audit

Events

sys.server_event_sessions

Lists all the event session definitions configured for the Database Engine instance

sys.server_event_session_actions

View actions on each event on an event session

sys.server_event_session_events

View each event in an event session

sys.server_event_session_fields

View each customizable column set on events and targets

sys.server_event_session_targets

View each event target for a specific event session

I/O

sys.dm_io_pending_io_requests

Provides information on unfulfilled I/O requests

sys.dm_io_backup_tapes

Information on tape devices and mounts requests for backups

sys.dm_io_cluster_shared_drives

Information on shared drives if the host server is a member of a failover cluster

sys.dm_io_virtual_file_stats

Information on I/O statistics for data and log files

Deadlocks

sys.dm_tran_locks

Information on active locks

sys.dm_os_waiting_tasks

Information on tasks waiting on resources

sys.dm_exec_requests

Requests that are executing within SQL Server

Transaction Isolation Levels

Overview of transaction isolation levels, from least to most restorictive (i.e. least chance of causing locks, to most likely to cause locks):

SNAPSHOT:
This transaction level takes a copy of the data being read and puts it into tempdb. This ensures that the results at the start and end of a transaction will be identical, as it refers to a copy of the data, rather than putting locks on the actual data.

READ UNCOMMITTED:
This is the least restrictive transaction isolation level and the least likely to result in locks. The READ UNCOMMITTED transaction level allows transactions to make modifications to data read by the current transaction.

Use this level if you do not want the Database Engine to issue shared locks to prevent other transactions from making modifications to data read by the current transaction. READ UNCOMMITTED is the least restrictive transaction isolation level.

READ UNCOMMITTED does not prevent other transactions from making modifications to data read by the current transaction

READ COMMITTED:
This transaction level blocks statements from reading data that has been modified by but not committed by other transactions. At this transaction level, data can be changed by other transactions between the execution of individual statements within the current transaction.

If you set the READ COMMITTED transaction levels, statements cannot read data that has been modified by other transactions that are uncommitted. This prevents dirty reads. Data can be changed by other transactions between the execution of individual statements within the current transaction. READ COMMITTED is the SQL Server 2012 default transaction isolation level.

READ COMMITTED transaction isolation level blocks other statements from reading data that has been modified but not committed by other transactions, but data can be changed by other transactions between the execution of individual statements within the current transaction.

REPEATABLE READ:
This transaction level blocks statements from reading data that has been modified but not committed. When this level is set, transactions cannot alter data that has been read by the current transaction until that transaction completes. REPEATABLE READ uses shared locks on all data read by each statement until the transaction completes.

If you set the REPEATABLE READ transaction level, statements are unable to read data that has been modified by uncommitted. No transactions can alter data that has been read by the current transaction until that transaction completes. REPEATABLE READ uses shared locks on all data read by each statement until the transaction completes.

REPEATABLE READ transaction isolation level is set, no transactions can alter data that has been read by the current transaction until that transaction completes

SERIALIZABLE:
The SERIALIZABLE transaction level is the most restrictive transaction isolation level and likely to result in more locks than other transaction isolation levels. When this isolation level is set, statements cannot read data that has been modified but not yet committed by other transactions, transactions are unable to modify data that has been read by the current transaction until that transaction completes, and other transactions are unable to insert new rows into the range of rows being read by the current transaction.

If you set the SERIALIZABLE transaction level, statements cannot read data that has been modified but not yet committed by other transactions; transactions are unable to modify data that has been read by the current transaction until that transaction completes; other transactions are unable to insert new rows into the range of rows being read by the current transaction. The SERIALIZABLE transaction level is the most restrictive isolation level.

Statements cannot read data that has been modified but not yet committed by other transactions.
Other transactions are unable to modify data that has been read by the current transaction until the current transaction completes.

Other transactions are unable to insert new rows into the range of rows being read by the current transaction.

Concurrency issues with each transaction isolation level:

This post gives some great overview of transaction isolation levels. The following table and definitions of dirty / repeatable / phantom reads are taken from there:

Isolation level	Dirty read	Nonrepeatable read	Phantom read
Read uncommitted	✔	✔	✔
Read committed	✗	✔	✔
Repeatable read	✗	✗	✔
Serializable	✗	✗	✗
Snapshot	✗	✗	✗

Dirty read: Session 1 begins a transaction and modifies data. Session 2 reads the modified data before Session 1 commits the transaction. Session 2 is reading data that exists only in theory because the Session 1 transaction might be rolled back. If that occurs, the data read by Session 2 is no longer valid.
Nonrepeatable read: Session 1 begins a transaction and retrieves a row of data from a table. Session 2 updates that row. Session 1 tries to retrieve the row once more, still within the original transaction, but receives different results because of the Session 2 update.
Phantom read: Session 1 begins a transaction and retrieves several rows of data from a table. Session 2 inserts a row into the same table, and the insert coincides with the search criteria of the Session 1 statement. When Session 1 reruns the query, the new row added by Session 2 is now included in the results.

Trace Flags

Trace flags are used within SQL Server to provide information / logging about specific system events, or to change specific configuration of the SQL Server instance.

Trace flags can be set on / off for the current connection only, using the following T-SQL code:

-- Turn on trace flag 1234

DBCC TRACEON (1234);

GO

-- Turn off trace flag 1234

DBCC TRACEOFF (1234);

GO

Traces can be set on / off globally (i.e. for ALL connections to the SQL Engine) with the addition of the -1 argument at the end:

-- Turn on trace flag 1234, globally

DBCC TRACEON (1234, -1);

GO

Multiple traces can be switched on by listing them within the parenthesis:

-- Turn on trace flags 1234, 2345 and 3456

DBCC TRACEON (1234, 2345, 3456);

GO

You can see which trace flags are currently active as follows (if none are active, a message of "DBCC execution completed. If DBCC printed error messages, contact your system administrator" will appear):

-- Check status of current trace flags

DBCC TRACESTATUS

Only members of the sysadmin fixed group can switch traces on / off.

Useful Trace Flags:
1222: Examine deadlocks
1204: Write deadlock information to SQL Server log

Trace flag resources:
MSDN: https://msdn.microsoft.com/en-us/library/ms188396.aspx
SQL Mag: http://sqlmag.com/sql-server/investigating-trace-flags
Stack Exchange: http://dba.stackexchange.com/questions/27704/sql-server-concise-list-of-trace-flags

Thursday, 30 April 2015

SQL Daily

Check out my buddy Andy / ACW. Great DBA, even greater guy!

http://sqldaily.blogspot.co.uk/

Sunday, 26 April 2015

Replication

Overview

Replication is a means for providing a copy of database objects and data onto a secondary server (or servers), with various different synchronisation options. There is a Publisher server (i.e. the main copy of the data), and a Subscriber server (or servers) that receive a copy of the data from the Publisher. This happens via a Distributor service and database (for Snapshot and Transactional replication) that is either on the Publisher (for Push Replication), or on the Subscriber (for Pull Replication). Replication can be used as a High Availability option, but other high availability options might be better if your one objective is HA. Replication can be used to integrate with data from other non Microsoft SQL Server sources / destinations - SQL can subscribe to an Oracle Publication for Transactional Replication, and can publish to both IBM DB2 / Oracle databases, as Transactional and Snapshot replication.

There are four flavours of replication in SQL Server 2012 & 2014:

Transactional Replication:
Changes are made on the Subscriber(s) at the same time as data changes on the Publisher. This is because the update to the Subscriber(s) happens within the same transaction as the change to the Publisher. Subscribers must be treated as Read Only, as changes are not pushed back to the Publisher (unless you're using Peer to Peer replication). Requires a primary key on replicated tables.
Good for data that changes frequently - high volume of inserts / updates / deletes, and for situations where Subscriber copies must remain consistent with Publisher at all times. Publisher doesn't need to be a SQL Server (can subscribe to Oracle publication)

Peer to Peer Transactional Replication:

Allows for Load Balancing among nodes. Allows for transactions at subscribers to be synced across other nodes in the replication topology. Conflicts must be prevented by partitioning data so same row cannot be updated at separate locations (as opposed to Merge Replication where a conflict detection agent is used).
Good for situations where data is partitioned in a way that would match the replication topology, and where conflicts are likely to be rare.

Merge Replication:
One or more updateable Subscribers. Changes at the Publisher and Subscriber(s) are tracked by the use of triggers, and applied to the other copies of data. Will create a unique key in replicated table if one doesn't already exist (if one does exist, it must be replicated). Updates can be applied in a schedule (e.g. every 90 mins)
Good for having offline copies of data, e.g. sales staff in the field that need to add / update data on a local copy and then sync back to a head office.

Snapshot Replication:
Data is only updated in bulk via snapshot of entire replicated data set. No incremental changes are made. Can replicate non unique data (i.e. no prerequisite for primary keys in tables to be replicated). Uses a shared folder (ensure replication agents have adequete permissions to this folder). Uses BCP to generate snapshot
Good for data that changes infrequently, often as a bulk process. Good for systems where data does not need to always be completely up to date (i.e. month end reporting). Good when only a small volume of data needs to be replicated.

Pros

Transactional Replication:
Data at the Subscriber(s) is always the same as on the Publisher - all changes on the Publisher happen on the Subscriber(s) within the same transaction.
Low latency between Publisher and Subscriber - changes happen (more or less) at the same time.

Peer to Peer Transactional Replication:
Allows Load Balancing among nodes

Merge Replication:
Data amendments are pushed both to and from the Publisher, so many copies of the data can be updated simultaneously.

Snapshot Replication:
Very low overhead compared to the other replication types, as changes aren't continually being replicated.

All types of Replication:
Good for data synchronisation across geographically distributed data centres.

Cons

Transactional Replication:
Can decrease performance of transactions at the Publisher, due to committing these transactions at the Subscriber(s) at the same time.
Doesn't support updates at Subscriber(s) unless updating Subscribers is configured.

Peer to Peer Replication:
Enterprise only, across all nodes in Replication topology. Conflicts break replication, and partitioning of data must be used to prevent conflicts.

Merge Replication:
Conflicts can occur where subscribers change the same data at the same time
Merge replication does not support indexed views as tables

Snapshot Replication:
No incremental updates - all data must be copied over each time

All types of Replication:
Requires a lot of bandwidth if there are a lot of changes to be replicated.
Replication can be difficult to troubleshoot

Replication Definitions:
Publisher - An instance that makes data available via a Publication
Article - A published object (i.e. a table, SP, view, etc)
Publication - A collection of Articles
Distributor - Instance that manages transmission from Publisher to Subscriber. Distributor is often also the Publisher. If this is not the case, then it can be considered a Remote Distributor
Subscriber - An instance that receives a Publication
Agent - Service that enables the Publisher, Distributor, Subscriber etc to perform replication based tasks

Replication Monitor - a tool available within SSMS to monitor replication. Can configure alerts based on performance benchmarks

Replication resources

Very useful Codeproject page on replication: http://www.codeproject.com/Articles/715550/SQL-Server-Replication-Step-by-Step
MSDN page for Replication: https://msdn.microsoft.com/en-us/library/ms152531(v=sql.120).aspx
Transactional Replication: https://msdn.microsoft.com/en-us/library/ms151176(v=sql.110).aspx
Merge Replication: https://msdn.microsoft.com/en-us/library/ms152746.aspx
Snapshot Replication: https://msdn.microsoft.com/en-us/library/ms151832.aspx
SQL Server Central Stairway to Replication: http://www.sqlservercentral.com/stairway/72401/