SQL Server Data Engineering

Wednesday, 7 October 2020

Updating massive amount of rows whilst avoiding blocking

The following SQL is a good means to split an update on a massive table into smaller chunks, whilst reducing blocking. The method is to update top N rows from a table, within a while loop, with a wait between loops, to avoid excessive blocking of the table and allow other transactions between each loop. Further blocking reduction is discussed in the link below.

set nocount on

declare 
    @ChunkSize              int = 1000,                         -- count rows to remove in 1 chunk 
    @TimeBetweenChunks      char(8) = '00:00:01',               -- interval between chunks
    
    @Start                  datetime,
    @End                    datetime,
    @Diff                   int,
    
    @MessageText            varchar(500),
    
    @counter                int = 1,
    @RowCount               int = 1,
    @TotalRowsToUpdate      bigint,
    @TotalRowsLeft          bigint
    


-- total row count to update
set @TotalRowsToUpdate = (select count(*)
                            from [Table1]
                                join [Table2] on
                                    btid = tBtID
                            where   btStatusID = 81)


set @TotalRowsLeft = @TotalRowsToUpdate
set @MessageText = 'Total Rows to Update = ' + cast(@TotalRowsLeft as varchar) raiserror (@MessageText,0,1) with nowait
print ''


-- begin cycle
while @RowCount > 0 begin

    set @Start = getdate()

    -- update packages
    update top (@ChunkSize) bti
        set btstatusid = 154,
            btType = 1
    from [Table1] bti
        join [Table2] on
            btid = tBtID
    where   btStatusID = 81
    

    set @RowCount = @@ROWCOUNT

    -- measure time
    set @End = getdate()
    set @Diff = datediff(ms,@Start,@End)

    set @TotalRowsLeft = @TotalRowsLeft - @RowCount
    set @MessageText = cast(@counter as varchar) + ' - Updated ' + cast(@RowCount as varchar) + ' rows in ' + cast(@Diff as varchar) + ' milliseconds - total ' + cast(@TotalRowsLeft as varchar) + ' rows left...'

    -- print progress message
    raiserror (@MessageText,0,1) with nowait


    set @counter += 1

    WAITFOR DELAY @TimeBetweenChunks

end

Resources:

Stack Overflow post on this subject (where SQL above was taken from): https://dba.stackexchange.com/questions/276314/how-to-avoid-table-lock-escalation

Monday, 5 October 2020

Query Store

SQL Server's Query Store is a database level feature available in every edition. It captures a history of queries run, their query plans, their execution statistics etc, to help performance troubleshooting, by helping to identify regressed query plans, better query plans. Essentially a "black box recorder" for SQL Server.

To turn on Query Store:

USE [master]
GO
ALTER DATABASE [DatabaseName] SET QUERY_STORE = ON
GO
ALTER DATABASE [DatabaseName] SET QUERY_STORE (OPERATION_MODE = READ_WRITE)
GO

You'll then notice a new folder within SSMS under the database you enabled it for:

Resources:

Excellent post on SQLShack: https://www.sqlshack.com/sql-server-query-store-overview/

Very in depth post from Red Gate: https://www.red-gate.com/simple-talk/sql/database-administration/the-sql-server-2016-query-store-overview-and-architecture/

More info on setting up and using Query Store: https://ballardchalmers.com/2019/07/23/query-store-sql-server/

Identity values jumping by 1000 - IDENTITY_CACHE

Sometimes you will see a large jump of 1000 values in an identity column of a table. This can happen when SQL Server caches 1000 IDs for the table. If the SQL Service is restarted, these cached IDs can be lost, causing SQL Server to begin the next increment at the ID post the previous cache.

From SQL 2017, this behaviour can be turned off by setting IDENTITY_CACHE = OFF

From the SQL Server documentation:

SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use its own mechanism to generate key values. Using a sequence generator with the NOCACHE option can limit the gaps to transactions that are never committed.

Resources:

Pinal Dave's explanation and examples: https://blog.sqlauthority.com/2018/01/24/sql-server-identity-jumping-1000-identity_cache/

StackOverflow post about the issue, and workarounds if ID gaps are not acceptable: https://stackoverflow.com/questions/14146148/identity-increment-is-jumping-in-sql-server-database

Saturday, 3 October 2020

SSIS - fix "only administrators have access to the Integration Services service" error

Scheduling SSIS packages from SQL Agent can produce the following error on a new installation of SQL Server:

“Connecting to the Integration Services service on the computer “…” failed with the following error: "Access is denied". By default only administrators have access to the Integration Services service. On Windows Vista and later the process must be running with administrative privileges in order to connect to the Integration Services service.”

The following article describes the fix (adding permissions for the SQL Agent service to SSIS via DCOM)

https://www.mssqltips.com/sqlservertip/5077/permissions-to-schedule-an-ssis-package-from-sql-server-agent-and-the-file-system/

Monday, 28 September 2020

Split out delimited text within SQL Server

Sometimes splitting delimited text within SQL server is easier than splitting the text on the way in via SSIS etc. The following (all shamelessly stolen from the link below) is an example of splitting the following myAddress column:

into the following columns:

using this SQL:

SELECT
REVERSE(PARSENAME(REPLACE(REVERSE(myAddress), ',', '.'), 1)) AS [Street]
, REVERSE(PARSENAME(REPLACE(REVERSE(myAddress), ',', '.'), 2)) AS [City]
, REVERSE(PARSENAME(REPLACE(REVERSE(myAddress), ',', '.'), 3)) AS [State]
FROM dbo.custAddress;
GO

Reference:

https://www.mssqltips.com/sqlservertip/6321/split-delimited-string-into-columns-in-sql-server-with-parsename/

Sunday, 27 September 2020

Getting started with Python Web Scraping

Getting started with Python on a Mac was fairly straightforward, but I had a few stumbling blocks on Windows. The easiest way to get started with Python, a decent IDE & terminal, and additional libraries, was to install Anaconda on Windows, and use the Spyder IDE.

Using Beautiful Soup for web scraping, the following is a script I wrote to get the current top non fiction audiobooks from Audible:

# Script to get top Audible personal development books

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.co.uk/Best-Sellers-Books-Self-Help-How/zgbs/books/2996349031/ref=zg_bs_nav_b_3_2996114031'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('ol', class_ = 'a-ordered-list a-vertical')

list_elems = results.find_all('li', class_ = 'zg-item-immersion')
for list_elem in list_elems[:50]:
rank_elem = list_elem.find('span', class_ = 'zg-badge-text')
title_elem = list_elem.find('div', class_ = 'p13n-sc-truncate p13n-sc-line-clamp-1')
author_elem = list_elem.find('span', class_ = 'a-size-small a-color-base')

title_elem = title_elem.text.replace(' ', '')
title_elem = title_elem.replace('\n', '')

print(rank_elem.text.replace('#', '') + ' - ' + title_elem + ' - ' + author_elem.text.replace('\t',''))

Returns the following:

Wednesday, 9 September 2020

SQL Server Migration using dbatools Powershell Module

Migrations between SQL Servers can be laborious, and doing them manually leaves a lot of room for human error. Scripting a migration using the dbatools Powershell module is incredibly simple, quick and robust.

Prerequisites

Powershell with dbatools module installed:

https://dbatools.io/download/

Steps

Run the following Powershell:

$startDbaMigrationSplat = @{
Source = 'sourceServer'
Destination = 'destinationServer'
BackupRestore = $true
SharedPath = '\\path\both-servers-can-access'
}

Start-DbaMigration @startDbaMigrationSplat -Force | Select * | Out-GridView

This will migrate everything associated with the source instance, to the destination instance (inc logins, linked servers, startup procs...), using a backup and restore method for moving the databases. Use the -Exclude flag (see documentation link below) to exclude bits you don't want to be migrated over