SQL Server Data Engineering: SQL Server

Showing posts with label SQL Server. Show all posts

Wednesday, 7 October 2020

Updating massive amount of rows whilst avoiding blocking

The following SQL is a good means to split an update on a massive table into smaller chunks, whilst reducing blocking. The method is to update top N rows from a table, within a while loop, with a wait between loops, to avoid excessive blocking of the table and allow other transactions between each loop. Further blocking reduction is discussed in the link below.

set nocount on

declare 
    @ChunkSize              int = 1000,                         -- count rows to remove in 1 chunk 
    @TimeBetweenChunks      char(8) = '00:00:01',               -- interval between chunks
    
    @Start                  datetime,
    @End                    datetime,
    @Diff                   int,
    
    @MessageText            varchar(500),
    
    @counter                int = 1,
    @RowCount               int = 1,
    @TotalRowsToUpdate      bigint,
    @TotalRowsLeft          bigint
    


-- total row count to update
set @TotalRowsToUpdate = (select count(*)
                            from [Table1]
                                join [Table2] on
                                    btid = tBtID
                            where   btStatusID = 81)


set @TotalRowsLeft = @TotalRowsToUpdate
set @MessageText = 'Total Rows to Update = ' + cast(@TotalRowsLeft as varchar) raiserror (@MessageText,0,1) with nowait
print ''


-- begin cycle
while @RowCount > 0 begin

    set @Start = getdate()

    -- update packages
    update top (@ChunkSize) bti
        set btstatusid = 154,
            btType = 1
    from [Table1] bti
        join [Table2] on
            btid = tBtID
    where   btStatusID = 81
    

    set @RowCount = @@ROWCOUNT

    -- measure time
    set @End = getdate()
    set @Diff = datediff(ms,@Start,@End)

    set @TotalRowsLeft = @TotalRowsLeft - @RowCount
    set @MessageText = cast(@counter as varchar) + ' - Updated ' + cast(@RowCount as varchar) + ' rows in ' + cast(@Diff as varchar) + ' milliseconds - total ' + cast(@TotalRowsLeft as varchar) + ' rows left...'

    -- print progress message
    raiserror (@MessageText,0,1) with nowait


    set @counter += 1

    WAITFOR DELAY @TimeBetweenChunks

end

Resources:

Stack Overflow post on this subject (where SQL above was taken from): https://dba.stackexchange.com/questions/276314/how-to-avoid-table-lock-escalation

Monday, 5 October 2020

Identity values jumping by 1000 - IDENTITY_CACHE

Sometimes you will see a large jump of 1000 values in an identity column of a table. This can happen when SQL Server caches 1000 IDs for the table. If the SQL Service is restarted, these cached IDs can be lost, causing SQL Server to begin the next increment at the ID post the previous cache.

From SQL 2017, this behaviour can be turned off by setting IDENTITY_CACHE = OFF

From the SQL Server documentation:

SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use its own mechanism to generate key values. Using a sequence generator with the NOCACHE option can limit the gaps to transactions that are never committed.

Resources:

Pinal Dave's explanation and examples: https://blog.sqlauthority.com/2018/01/24/sql-server-identity-jumping-1000-identity_cache/

StackOverflow post about the issue, and workarounds if ID gaps are not acceptable: https://stackoverflow.com/questions/14146148/identity-increment-is-jumping-in-sql-server-database

Monday, 30 October 2017

SSIS Packages With Excel Source Failing After Windows Updates

SSIS packages with Excel Sources may fail after Windows updates (specifically KB4041681) with the following error:

SSIS Error Code DTS_E_OLEDB_NOPROVIDER_ERROR. The requested OLE DB provider Microsoft.ACE.OLEDB.12.0 is not registered

The resolution is to install the 2007 Office System Driver: Data Connectivity Components from here:

https://www.microsoft.com/en-us/download/details.aspx?id=23734

Note - SSIS packages must have the "Use 32 bit runtime" box checked if being scheduled via a SQL Agent Job

Wednesday, 25 October 2017

Finding out when updates / service packs were applied to SQL Server

It is possible to find out when updates were applied to SQL Server by checking in the following location:

c:\Program Files\Microsoft SQL Server\XXX\Setup Bootstrap\Log

See more here: https://stackoverflow.com/questions/2950911/how-to-determine-date-when-service-pack-was-applied-to-sql-server-instance

Friday, 6 October 2017

SSRS Subscriptions - What Goes On Under The Hood

Excellent post detailing what happens "under the hood" of SQL Server Reporting Services

https://blogs.msdn.microsoft.com/sql_pfe_blog/2016/06/20/ssrs-subscriptions-what-goes-on-under-the-hood-3/

Wednesday, 16 August 2017

SQL Server Dedicated Admin Connection (DAC)

The Dedicated Admin Connection (or DAC for short - not to be confused with DAC packages) is exactly that - a dedicated connection that's available to SQL Server sysadmins, for use when a SQL Server might have become unresponsive. This backdoor for the sysadmins has reserved resources at all times, and allows a means for connecting to SQL Server when other ways of connecting may not be working due to performance (or other) issues.

By default, connection to SQL Server via the DAC is only available locally from the server - i.e. you'd need to remote onto the server then use SQLCMD or SSMS to use the DAC. If the host is unresponsive due to performance issues, this could prove problematic. The solution is to enable remote admin connections - this allows connecting via the DAC remotely, and can be switched on by using the following SQL:

EXEC sp_configure 'remote admin connections', 1;

GO

RECONFIGURE

GO

Connecting to the DAC remotely via SSMS is then as simple as prefixing your server\instance name with "admin:"

Kendra Little has an excellent blog post and video explaining more, including how to tell who is using the DAC (if you're unable to use it), how to enable on clusters, and more:

https://www.brentozar.com/archive/2011/08/dedicated-admin-connection-why-want-when-need-how-tell-whos-using/

Monday, 22 February 2016

SQL Server Start Up Switches

Some useful SQL Server start up switches:

Switch	Action
-d	Default startup option – specifies the fully qualified path for the master database file. If this is not provided, the current registry value will be used
-e	Default startup option – specifies the fully qualified path for the master database error log file. If this is not provided, the current registry value will be used
-l	Default startup option – specifies the fully qualified path for the master database log file If this is not provided, the current registry value will be used
-m	Start SQL Server in single user mode. Allows any member of the Local Administrator group to connect as sysadmin onto the SQL instance. CHECKPOINT process is not started
-m”client app name”	Specify which application can take the single user connection (e.g. –m”SQL Server Management Studio – Query”). Use when an unknown application keeps grabbing the single user connection when SQL Server starts
-f	Starts SQL Server in minimal configuration mode (this is also single user mode as well). Useful if a configuration setting is stopping SQL Server from starting
-n	Doesn’t use Windows Application Log to record events. Use –e in conjunction with this to ensure events are logged to SQL error log
-s	Start a named instance. Without this, a default instance is started
-Ttrace#	Start SQL with a specified trace flag (note the uppercase T)
-x	Disables several monitoring features
-c	Shortens the time taken to start SQL Server from command line

The following T-SQL code will tell you which start up switches have currently been applied to SQL Server:

SELECT

    DSR.registry_key,

    DSR.value_name,

    DSR.value_data

FROM sys.dm_server_registry AS DSR

WHERE

    DSR.registry_key LIKE N'%MSSQLServer\Parameters';

...and querying the sys.dm_server_registry dynamic view in it's entirety will give you details about the registry entries for SQL Server.

Resources:
Info on trace flags: http://www.sqlservercentral.com/articles/trace+flags/70131/

Saturday, 20 February 2016

Permissions - Server & Database Roles

When assigning permissions, it is important to always grant the least privileged role - minimum permission to perform the function that is required.

Server Roles:
More info from MSDN

Fixed server-level role	Description
sysadmin	Members of this role can perform any action on the SQL Server instance. Initial membership of this role happens at installation
serveradmin	Members can perform instance wide administration tasks, including shutting down the instance
securityadmin	Members can manage instance level permissions. Permissions for this role are equivalent to sysadmin - as members of this role can elevate their own permissions to sysadmin
processadmin	Members can view and terminate processes on the database engine
setupadmin	Members can add / remove linked servers to the database instance, and execute some system stored procedures
bulkadmin	Members can run the BULK INSERT statement
diskadmin	Members can administer database file configuration
dbcreator	Ability to create databases (database will be owned by the creator)
public	Every login is automatically a member of the public role; you cannot change membership for public. You can change the permissions for public however. If a user has no permissions granted, they will revert to the permissions assigned to public

Database Roles:

More info from MSDN

Database-level role name	Description
db_owner	Has permissions to do anything within the database
db_securityadmin	Members can administer security of roles (note: can elevate their own permissions to db_owner)
db_accessadmin	Can administer who has access to database for Windows Logins, Windows Groups and SQL Server Logins
db_backupoperator	Members have ability to backup the database. Cannot restore the database - this is reserved for dbcreator / sysadmin fixed server roles
db_ddladmin	Can perform any Data Definition Language (DDL) operations (those that create / modify structure of database), e.g. CREATE, ALTER, DROP, TRUNCATE
db_datawriter	Allows users to add / delete / update data on all user tables
db_datareader	Allows users to read data from all user tables
db_denydatawriter	Members cannot add / delete / update any data in any user tables
db_denydatareader	Members cannot read data from any user tables

High Availability - AlwaysOn Availability Groups

Overview:
AlwaysOn Availability Groups can be thought of as a hybrid of the best of clustering, and the best of mirroring. It must be configured on a Windows Server Failover Cluster, but rather than installing SQL Server as a clustered installation with shared storage for the databases, a separate installation of SQL Server is installed on each node of the Windows Cluster, using the local disks of the cluster node (i.e. the OS is clustered, SQL Server isn't). Like clustering, the Availability Groups have a shared IP / name that everyone can point to. Unlike clustering, databases are mirrored across all servers in the cluster. There can be one primary, and up to four replicas. Updates can be asynchronous or synchronous. Requires Full recovery mode for databases.

Secondaries / Read Only Replicas:
Secondaries can be used as read only databases for backups / reporting / DBCC checks etc, although these functions lend themselves more to secondary replicas in an asynchronous configuration, as synchronous secondaries are more useful for failover.

Connecting to Secondaries:
You can connect directly to a secondary replica, or indirectly by using the "ApplicationIntent" (either ReadWrite or ReadOnly) within the connection string for SQL Server connections, with Read Only routing configured on the listener service for the main shared instance name.

Failover Types:
From MSDN:

Synchronous-commit replicas support two settings—automatic or manual. The "automatic" setting supports both automatic failover and manual failover. To prevent data loss, automatic failover and planned failover require that the failover target be a synchronous-commit secondary replica with a healthy synchronization state (this indicates that every secondary database on the failover target is synchronized with its corresponding primary database). Whenever a secondary replica does not meet both of these conditions, it supports only forced failover. Note that forced failover is also supported a replicas whose role is in the RESOLVING state.
Asynchronous-commit replicas support only the manual failover mode. Moreover, because they are never synchronized, they support only forced failover.

Pros:
Good for high availability, and reporting via usable secondary copies of the data.

Cons:
You cannot add system databases to an Availability Group. To that end, logins, SQL Jobs etc are not available on secondary replicas automatically - these must be manually set up / kept in sync.
Expensive, as you must license each node individually.
Doubles your data (or more) - as you have multiple copies of the databases many times.
Distributed transactions are not guaranteed in the event of failover.
All servers in the availability group must be in the same domain

Resources:
Great post here around HA options in SQL Server: http://dba.stackexchange.com/questions/53815/clustering-vs-transactional-replication-vs-availability-groups

Tuesday, 7 July 2015

Create database from just an MDF file

In a situation where you may have no backup of a database, but you have the data file (the .mdf) but no log file (.ldf), the following SQL script can be used to create a database using just the .mdf data file:

CREATE DATABASE DatabaseName

ON (FILENAME = '<path to your file here>\DatabaseName_Data.mdf')

FOR ATTACH_REBUILD_LOG;

Sometimes additional steps are required, as documented here: https://www.mssqltips.com/sqlservertip/3579/how-to-attach-a-sql-server-database-without-a-transaction-log-and-with-open-transactions/

Monday, 4 May 2015

Transparent Data Encryption (TDE)

Transparent Data Encryption (or TDE) is a means for encrypting your entire database, transparently. It was introduced from SQL Server 2008. It is transparent to the applications, as SQL Server does the encryption / decryption of data on the fly as required, there is no additional configuration required for your applications to make use of a database encrypted using TDE. Once TDE is set up, the data will be encrypted on the disk, and any backups of the data will be encrypted also, and you'll be unable to restore the data without the encyption key.

Setting up TDE:
There are a few steps to set up TDE for your SQL Server database(s):

Create a Master Key (on the master database)
Create a Certificate with the Master Key (on the master database, and back this certificate up)
Create Database Encryption Key (on the database you want to use TDE on)
Turn on encryption on the required database
Repeat steps 3 & 4 for any other databases you require TDE for

SQL for these steps (taken from MSDN):

use master;

go

create master key encryption by password = '<UseStrongPasswordHere>';

go

create certificate MyServerCert

with

subject = 'My DEK Certificate';

go

use AdventureWorks2012;

go

create database encryption key

with algorithm = aes_128

encryption by server certificate MyServerCert;

go

alter database AdventureWorks2012 set encryption on;

go

Moving a TDE Encrypted database:

If a database using TDE is required to be restored to another server, the following steps must be followed:

Via Windows Explorer, copy TDE encrypted database, backup of server key and certificate to destination server
Create database master key on destination server
Recreate server certificate on destination server using backup from origin server
Restore / attach TDE encypted database

Resources:

Recover database without database encryption key:
http://dba.stackexchange.com/questions/34753/can-i-recover-a-tde-certificate-by-restoring-the-master-database

MSDN - set up / move TDE encrypted database: https://msdn.microsoft.com/en-gb/library/ff773063.aspx

SQLIO & SQLIOSIM

SQLIO is a lightweight command line application from Microsoft, to test your disk I/O and get some performance benchmarking figures.

Resources:
Brent Ozar blog post about SQLIO: http://www.brentozar.com/archive/2008/09/finding-your-san-bottlenecks-with-sqlio/
Download SQLIO from Microsoft: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=20163

SQLIOSIM is not a performance benchmarking tool, but more of a stress testing tool, due to it's use of random patterns. Unlike SQLIO, SQLIOSIM has a graphical user interface. SQLIOSIM will test what kind of I/O you'll get from a database based on where you'll put your data and log files, auto grow configuration, whether or not you're using sparse files etc

Resources:
Download SQLIOSIM from Microsoft: https://support.microsoft.com/en-us/kb/231619

Sunday, 3 May 2015

Dynamic Management Views (DMVs)

Information about some of the DMVs availabile in SQL Server, and their function

Sessions

sys.dm_exec_sessions
Information about successful and unsuccessful logins

sys.dm_exec_connections
Provides information about connections established to the Database Engine instance

sys.dm_exec_requests
Information about each request executing on the SQL instance

sys.dm_exec_cursors
Information about cursors open

Audits

sys.dm_audit_actions
Information about every audit action that can be reported in the audit log as well as every audit action group that you are able to configure as part of SQL Server Audit

sys.dm_server_audit_status

Information about the current state of auditing

sys.dm_audit_class_type_map

Information about the class_type field in the audit output

fn_get_audit_file

Information from an audit output file that has already been generated by a server audit

Events

sys.server_event_sessions

Lists all the event session definitions configured for the Database Engine instance

sys.server_event_session_actions

View actions on each event on an event session

sys.server_event_session_events

View each event in an event session

sys.server_event_session_fields

View each customizable column set on events and targets

sys.server_event_session_targets

View each event target for a specific event session

I/O

sys.dm_io_pending_io_requests

Provides information on unfulfilled I/O requests

sys.dm_io_backup_tapes

Information on tape devices and mounts requests for backups

sys.dm_io_cluster_shared_drives

Information on shared drives if the host server is a member of a failover cluster

sys.dm_io_virtual_file_stats

Information on I/O statistics for data and log files

Deadlocks

sys.dm_tran_locks

Information on active locks

sys.dm_os_waiting_tasks

Information on tasks waiting on resources

sys.dm_exec_requests

Requests that are executing within SQL Server

Transaction Isolation Levels

Overview of transaction isolation levels, from least to most restorictive (i.e. least chance of causing locks, to most likely to cause locks):

SNAPSHOT:
This transaction level takes a copy of the data being read and puts it into tempdb. This ensures that the results at the start and end of a transaction will be identical, as it refers to a copy of the data, rather than putting locks on the actual data.

READ UNCOMMITTED:
This is the least restrictive transaction isolation level and the least likely to result in locks. The READ UNCOMMITTED transaction level allows transactions to make modifications to data read by the current transaction.

Use this level if you do not want the Database Engine to issue shared locks to prevent other transactions from making modifications to data read by the current transaction. READ UNCOMMITTED is the least restrictive transaction isolation level.

READ UNCOMMITTED does not prevent other transactions from making modifications to data read by the current transaction

READ COMMITTED:
This transaction level blocks statements from reading data that has been modified by but not committed by other transactions. At this transaction level, data can be changed by other transactions between the execution of individual statements within the current transaction.

If you set the READ COMMITTED transaction levels, statements cannot read data that has been modified by other transactions that are uncommitted. This prevents dirty reads. Data can be changed by other transactions between the execution of individual statements within the current transaction. READ COMMITTED is the SQL Server 2012 default transaction isolation level.

READ COMMITTED transaction isolation level blocks other statements from reading data that has been modified but not committed by other transactions, but data can be changed by other transactions between the execution of individual statements within the current transaction.

REPEATABLE READ:
This transaction level blocks statements from reading data that has been modified but not committed. When this level is set, transactions cannot alter data that has been read by the current transaction until that transaction completes. REPEATABLE READ uses shared locks on all data read by each statement until the transaction completes.

If you set the REPEATABLE READ transaction level, statements are unable to read data that has been modified by uncommitted. No transactions can alter data that has been read by the current transaction until that transaction completes. REPEATABLE READ uses shared locks on all data read by each statement until the transaction completes.

REPEATABLE READ transaction isolation level is set, no transactions can alter data that has been read by the current transaction until that transaction completes

SERIALIZABLE:
The SERIALIZABLE transaction level is the most restrictive transaction isolation level and likely to result in more locks than other transaction isolation levels. When this isolation level is set, statements cannot read data that has been modified but not yet committed by other transactions, transactions are unable to modify data that has been read by the current transaction until that transaction completes, and other transactions are unable to insert new rows into the range of rows being read by the current transaction.

If you set the SERIALIZABLE transaction level, statements cannot read data that has been modified but not yet committed by other transactions; transactions are unable to modify data that has been read by the current transaction until that transaction completes; other transactions are unable to insert new rows into the range of rows being read by the current transaction. The SERIALIZABLE transaction level is the most restrictive isolation level.

Statements cannot read data that has been modified but not yet committed by other transactions.
Other transactions are unable to modify data that has been read by the current transaction until the current transaction completes.

Other transactions are unable to insert new rows into the range of rows being read by the current transaction.

Concurrency issues with each transaction isolation level:

This post gives some great overview of transaction isolation levels. The following table and definitions of dirty / repeatable / phantom reads are taken from there:

Isolation level	Dirty read	Nonrepeatable read	Phantom read
Read uncommitted	✔	✔	✔
Read committed	✗	✔	✔
Repeatable read	✗	✗	✔
Serializable	✗	✗	✗
Snapshot	✗	✗	✗

Dirty read: Session 1 begins a transaction and modifies data. Session 2 reads the modified data before Session 1 commits the transaction. Session 2 is reading data that exists only in theory because the Session 1 transaction might be rolled back. If that occurs, the data read by Session 2 is no longer valid.
Nonrepeatable read: Session 1 begins a transaction and retrieves a row of data from a table. Session 2 updates that row. Session 1 tries to retrieve the row once more, still within the original transaction, but receives different results because of the Session 2 update.
Phantom read: Session 1 begins a transaction and retrieves several rows of data from a table. Session 2 inserts a row into the same table, and the insert coincides with the search criteria of the Session 1 statement. When Session 1 reruns the query, the new row added by Session 2 is now included in the results.