Removing duplicates from a large table before adding a clustered primary key

The challenge with this one, was to remove over 2,000 duplicate rows from a 600,000,000 row heap, before a multi-column clustered primary key could be added.

(I am so happy to finally be able to document this, from a while ago. The phrase “uniquely challenging” was never so apt.)

My first idea was to create a empty copy of the source table, add a “ignore duplicates” index, then start an INSERT [dbo].[targettable] SELECT * FROM [dbo].[sourcetable]. However, I stopped after an hour as it was going to take too long.

My second idea was to use SSIS to populate the target-table. Again this was just too slow.

My third idea was to just remove the duplicates (2,000 being a much smaller number than 600,000,000). First though, a disclaimer, I only considered this ** high risk ** solution because the data was in a QA environment (not Live), and people were waiting for me.

Step one then, was to populate a temp-table with the duplicates …

SELECT column1, column2, column3, column4
INTO #tmp1
FROM [dbo].[sometablename]
GROUP BY column1, column2, column3, column4
HAVING COUNT(*) > 1;

(“column1” etc were not the real column names by-the-way 🙂 ). Step two was to loop through this list removing duplicates from the permanent table

WHILE exists (SELECT 1 FROM #tmp1)
BEGIN

	DECLARE @1 BIGINT, @2 INT, @3 BIGINT
	DECLARE @4 VARCHAR(10)

	SELECT TOP(1) 
		@1 = column1, 
		@2 = column2, 
		@3 = column3, 
		@4 = column4 
	FROM #tmp1

	DELETE TOP(1) 
	FROM [dbo].[sometablename] 
	WHERE column1 = @1
	AND column2 = @2
	AND column3 = @3
	AND column4 = @4

	DELETE 
	FROM #tmp1
	WHERE column1 = @1
	AND column2 = @2
	AND column3 = @3
	AND column4 = @4

END

Step three was to implement the clustered primary key. (by-the-way I was half expecting this to fail with a duplicates error, but happily the above loop had cleared a single duplicate from each pair, and thats all that was needed).

ALTER TABLE [dbo].[sometablename] 
ADD CONSTRAINT pk_sometablename
 PRIMARY KEY CLUSTERED
 ([column1], [column2], [column3], [column4]);

** just to point out – I consider the loop “high risk” because, an error message, a typo, accidentally running it twice, or faulty logic, could have resulted in disaster 🙂

Resource Governor

The Resource Governor is one of those rarely used tools, as its impact is inherently ‘negative’. It restricts resources from a user-group, but cannot be seen to boost performance of a more favoured group.

For example, you could restrict the SA login to use just 5% of the CPU resource. The effect of this would be that SA can use 100% of the CPU resource, until another user-group wants to use more CPU than is currently available. SA will then be throttled back, and back, until it is only using 5%. Five percent is the minimum SA can use.

It is not easy to see this throttling in action – without firing up perf-mon. Also, the GUI view within SSMS is not-so-good, showing only some of the configuration. Therefore this is one of those things that I work with completely at the code level.

So, here then is my crib-sheet …

-- ResPool.sql

-- 1 create custom resource pool of available hardware
USE master;
GO
CREATE RESOURCE POOL MxResourcePool WITH
(
	MIN_CPU_PERCENT = 0,
	MAX_CPU_PERCENT = 5
);
GO

-- 2 create a group and link it to the resource pool
USE master;
GO
CREATE WORKLOAD GROUP MxWorkloadGroup
USING MxResourcePool;
GO

-- 3 populate the group
USE master;
GO
CREATE FUNCTION dbo.fn_MxClassifier()
RETURNS sysname WITH SCHEMABINDING
AS
BEGIN
DECLARE @wg AS sysname
IF SUSER_NAME() = 'sa'
	SET @wg = 'MxWorkloadGroup'
ELSE
	SET @wg = 'default'

RETURN @wg
END;
GO

ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.fn_MxClassifier);
ALTER RESOURCE GOVERNOR RECONFIGURE;
ALTER RESOURCE GOVERNOR RECONFIGURE;

-- check config

SELECT s.session_id,
s.host_name,
s.program_name,
s.nt_user_name,
s.login_name, 
w.name WorkgroupAssignment,
r.name ResourcePoolAssignment

FROM sys.dm_exec_sessions s
JOIN sys.dm_resource_governor_workload_groups w
  ON s.group_id = w.group_id
JOIN sys.dm_resource_governor_resource_pools r
  ON w.pool_id = r.pool_id
WHERE s.host_name IS NOT NULL
ORDER BY nt_user_name-- s.session_id desc

-- undo

ALTER RESOURCE GOVERNOR DISABLE;
DROP WORKLOAD GROUP MxWorkloadGroup;
DROP RESOURCE POOL MxResourcePool;
DROP FUNCTION dbo.fn_MxClassifier;
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL);
ALTER RESOURCE GOVERNOR RECONFIGURE;


Log shipping to SSRS

I like simplicity, and job schedules can add a ton of it. I set up one job on the primary server to backup logs, and one job on the secondary server to restore them from the primary share.

Both schedules were set to – every 15 minutes between 18:00 and 08:00 – so no overhead during the working day.

Incidentally, so I would never have to manually fix log shipping of a 2 TB database, I set up a 01:00 [full backup] job on the primary …

Which on completion would start a [full restore] job on the secondary (to “standby”)…

The first step of the [full backup] job was to check log shipping latency, and only proceed if it were over 2 hours.

Normally log shipping would run all night and the secondary server would be updated to 08:00. If there were an issue then the [full backup] / [full restore] jobs would “fix” log shipping and again the secondary would be updated to 08:00. If there were a more fundamental failure then the [full backup] / [full restore] jobs alone, would update the secondary to 01:00.

The final touches were to set the [full restore] job to notify me if it were ever executed, and to remove all notifications from the [log restore] job 🙂

Moving datafiles

Just so I can find this script when I need it …

--MoveDataFile.sql

USE master;
GO

-- 1 alter the metadata to the new path

ALTER DATABASE [SomeDatabaseName] MODIFY FILE (NAME = SomeLogicalFileName, FILENAME = 'F:\SQLData\SomeFileName.mdf');

-- 2 take database offline

ALTER DATABASE [SomeDatabaseName] SET OFFLINE;

-- 3 move the datafile via windows explorer

-- 4 bring database online

ALTER DATABASE [SomeDatabaseName] SET ONLINE;

Recovery Interval not optimal

The SQL Server “Recovery Time Interval” setting used to be ‘0’ by default, which could have a performance impact during a Checkpoint, by amplifying lazywriter contention.

Indirect checkpointing can alleviate this and has been available since SQL Server 2012. Indeed the default value was changed to ’60’ (seconds) from SQL Server 2016 onwards.

Splitting strings in practice.

Quite often on technical forums a request to extract part of a text string is tackled in splendid isolation. I’ve used the technique below with postcodes before, but this time wanted to split the text in an SSRS log file to isolate report names from paths.

For example the string …

/Finance/Monthly/Outgoing/Summary

… means the report called “Summary” is contained in the “Outgoing” folder within the “Monthly” folder inside the “Finance” folder in the root.

The tricky bit being that the nesting levels vary. There are reports in root. And there are reports buried ten layers deep like russian dolls.

Task one, getting the report names is relatively easy. You could use the REVERSE function to help grab the text up to the “first” slash using CHARINDEX.

Task two, getting the path, is where forums go deep, with strings of functions to extract all the text to the left of the last slash (where there are an unknown number of slashes).

Where as, it is often far simpler to use REPLACE to remove the already found report name from the whole string, leaving just the path.

SELECT
       Report,
       REPLACE(ItemPath,Report,'') [Path],
       ...
FROM
(
    SELECT RIGHT(ItemPath, CHARINDEX ('/', REVERSE(ItemPath))-1) Report,
           ...
 

Cannot drop user as it owns a schema

Whilst removing orphaned user accounts, I came across this error …

Msg 15138, Level 16, State 1, Line 1
The database principal owns a schema in the database, and cannot be dropped.

Looking at the properties of the user within the database I noticed that they owned the db_datawriter schema. Within the properties of that schema I was able to type db_datawriter as the new owner. After which I was able to remove the user from the database.

Here is a scripted way to do this …

https://www.mssqltips.com/sqlservertip/3439/script-to-drop-all-orphaned-sql-server-database-users/

Todays failed jobs

I cobbled this together to run on multiply servers via CMS. It is simpler and more trustworthy than alerts from monitoring-software.

--JobFailures.sql

SELECT
    J.name,
	H.FailedAt,
    H.Message
FROM
    msdb.dbo.sysjobs AS J
    CROSS APPLY (
        SELECT TOP(1)
            FailedAt = msdb.dbo.agent_datetime(T.run_date, T.run_time),
            Message = T.message
        FROM
            msdb.dbo.sysjobhistory AS T
        WHERE
            T.job_id = J.job_id
		AND
			T.run_status = 0 -- failed
		AND
			msdb.dbo.agent_datetime(T.run_date, T.run_time) > getdate()-1 -- in the last 24 hrs
--			msdb.dbo.agent_datetime(T.run_date, T.run_time) > getdate()-3 -- covering the weekend
        ORDER BY
            T.instance_id) H

Original code from here …

https://stackoverflow.com/questions/54215008/sql-agent-job-last-run-status

Quick row count

When COUNT(*) was too slow to get total rows from a billion row heap …

SELECT MAX(REPLACE(CONVERT(VARCHAR(20), 
       CONVERT(MONEY, rowcnt), 1), '.00', '')) [rows]
FROM sys.sysindexes
WHERE OBJECT_NAME(id) = 'TableName';

Original code from here …

https://www.i-programmer.info/programming/database/6576-sql-server-quickly-get-row-counts-for-tables-heaps-indexes-and-partitions.html

Replicating data from SQL Server to MySQL

Ok not really “Replication”. More like keeping a MySQL table in sync with a SQL Server table using both varieties of SQL.

I avoided using a linked-server as I wanted this to be able to cope with bulk loading. Sadly the correct tool, SSIS, was not available during this project.

I created a SQL Job with two steps 1) Export data to CSV and 2) Import into MySQL. The SQL code is highly parameterised so it can be reused.

Job Step 1: Export data to a CSV file

/* ExportDataToCsvFile.sql */

	DECLARE @datasource VARCHAR(100) = 'SomeTableName';
	DECLARE @cmd VARCHAR(400);
	SELECT @cmd = 'BCP SomeDatabase.dbo.' + @datasource + ' out D:\Export\' + @datasource + '.csv -t, -c -T';
	EXEC master..xp_cmdshell @cmd;
 

Line 1: Is just to let me know I have my own copy of this code block.
Line 3: Should be updated to the data source specific to each project (NOTE: For simplicity the data-source and CSV file both share this name.)

Job Step 2: Import CSV into MySQL table

/* ImportCsvIntoMYSqlTable.sql */

	DECLARE @table VARCHAR(100) = 'SomeTable'; /* << change this */
	DECLARE @database VARCHAR(100) = 'SomeInstance_SomeDatabase'; /* << change this */
	DECLARE @sql VARCHAR(2000) = '';
	DECLARE @cmd VARCHAR(8000);
	DECLARE @IsError INT;


/* 1 Build MySQL script to empty then refill table */

	SET @sql = @sql + 'START TRANSACTION;';
	SET @sql = @sql + 'DELETE FROM ' + @table + ';';
	SET @sql = @sql + 'LOAD DATA LOCAL INFILE ''G:\\Export\\' + @table + '.csv'' INTO TABLE ' + @table
	SET @sql = @sql + ' FIELDS TERMINATED BY '','' ENCLOSED BY ''\"'' LINES TERMINATED BY ''\r\n'';';
	SET @sql = @sql + 'COMMIT;';


/* 2 Execute it */

	SET @cmd = 'G:\Export\MySql\bin\mysql.exe --defaults-extra-file=G:\Export\MySql\'' + @database + '.cnf -e "' + @sql + '";';
	EXEC @IsError = master..xp_cmdshell @cmd;
	IF @IsError <> 0 RAISERROR('INFILE Error', 16, 1);


/* 3 Defragment table and Update stats */

	SET @sql = 'OPTIMIZE TABLE ' + @table + ';';
	SET @cmd = 'G:\Export\MySql\bin\mysql.exe --defaults-extra-file=G:\Export\MySql\' + @database + '.cnf -e "' + @sql + '";';
	EXEC @IsError = master..xp_cmdshell @cmd;
	IF @IsError <> 0 RAISERROR('OPTIMIZE Error', 16, 1);
 

Lines 3: Will need to be changed for each project. And names both the CSV file and target MySQL table.

Line 4: Is a previously created file used by MySQL to connect to a target instance & database.

Lines 12 to 16: Builds up a string of MySQL commands to empty then refill the table from the CSV file.

Lines 12 & 16: These two lines create a single transaction. This serves two purposes. 1) The ‘old data’ will not be deleted if the ‘new data’ load fails. 2) The switch-over from ‘old data’ to ‘new data’ will be instant.

Send an Email with a PDF attached

There are many posts on how to automatically generate of a PDF receipt and email it by leveraging SSRS. Here is how it went for me.

  1. Create a report that shows the details required.
  2. Create a parameters table.
  3. Subscribe to the report using the parameters table.
  4. Create a stored-procedure to populate and fire the subscription.

On a version of SQL Server Reporting Services (SSRS) that supports data-driven-subscriptions (DDS) I created a report called ‘SingleInvoice’ with one input parameter ‘invoice number’.

Outside of this report, in preparation for the DDS, I created a data source pointing to the local [ReportServer] database.

Within the [ReportServer] database I created a table called [dbo].[InvoiceParams]

CREATE TABLE [dbo].[InvoiceParams](
	[InvoiceNumber] [VARCHAR](100) NULL,
	[ToEmailAddress] [VARCHAR](200) NULL,
	[CCEmailAddress] [VARCHAR](200) NULL,
	[BccEmailAddress] [VARCHAR](200) NULL,
	[ReplyToEmailAddress] [VARCHAR](200) NULL,
	[IncludeReport] [BIT] NULL,
	[RenderFormat] [VARCHAR](20) NULL,
	[Priority] [VARCHAR](15) NULL,
	[Subject] [VARCHAR](150) NULL,
	[Comment] [VARCHAR](150) NULL,
	[IncludeLink] [BIT] NULL,
	[Active] [BIT] NULL,
	[DateInserted] [DATETIME] NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT (NULL) FOR [CCEmailAddress]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT (NULL) FOR [BccEmailAddress]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT (NULL) FOR [ReplyToEmailAddress]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT ((1)) FOR [IncludeReport]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT ('PDF') FOR [RenderFormat]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT ('Normal') FOR [Priority]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT ((0)) FOR [IncludeLink]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT ((1)) FOR [Active]
ALTER TABLE [dbo].[InvoiceParams] ADD  DEFAULT (GETDATE()) FOR [DateInserted]
GO

To simplify the stored procedure I defined default values for all the columns I would not be dynamically populating.

Next I created a Data-driven subscription on the report with a schedule in the past – so it would never fire. For Destination I chose E-Mail.

Within the subscription I edited the dataset and chose the previously created shared data source [ReportServer].

I added this query before clicking ‘Apply’….

SELECT * 
FROM dbo.InvoiceParams
WHERE Active = 1;

Back in the New Subscription form, I completed the Delivery options like this …

Within the user database I created this stored-procedure …

/*==================================================
  Author:		Richard Smith
  Create date:	10 Jul 2020
  Description:	To Email PDF receipts - demo version
  Test: Exec [dbo].[EmailReceipts] 'INV123456789',
            'Richard.Smith@company.com'
  =================================================*/

ALTER PROC [dbo].[EmailReceipts]
    @InvoiceNumber VARCHAR(100),
    @ToEmailAddress VARCHAR(200),
    @Subject VARCHAR(150) = 'test subject',
    @Comment VARCHAR(150) = 'test body',
    @SubscriptionID NVARCHAR(260) = '987654321' 
                       /* Report = "SingleInvoice" */
AS
BEGIN
    SET NOCOUNT ON;


/* 1 Save the inputs */

    INSERT INTO [ReportServer].[dbo].[InvoiceParams] 
            (InvoiceNumber, ToEmailAddress, [Subject], Comment)
    VALUES (@InvoiceNumber, @ToEmailAddress, @Subject, @Comment);


/* 2 Trigger subscription. Which will send the report (+ inputs) to the email-subsystem-queue */

    EXEC [ReportServer].[dbo].[AddEvent] @EventType = 'TimedSubscription', @EventData = @SubscriptionID;
    WAITFOR DELAY '00:00:10';


/* 3 If no longer in queue, flag as sent */

    IF NOT EXISTS (SELECT 1 FROM [ReportServer].[dbo].[Event] WHERE EventData = @SubscriptionID)
        UPDATE [ReportServer].[dbo].[InvoiceParams] 
        SET Active = 0
        WHERE InvoiceNumber = @InvoiceNumber
        AND ToEmailAddress = @ToEmailAddress;


/* 4 Manage the log */

	DELETE FROM [ReportServer].[dbo].[InvoiceParams] WHERE DateInserted < GETDATE()-30;
	SELECT * FROM [ReportServer].[dbo].[InvoiceParams] ORDER BY DateInserted DESC;

END;
GO

When executed with an email address and invoice number this stored procedure will send an email to the email address with the PDF invoice attached.

NOTE

To find @SubcriptionID I used this …

SELECT SCH.SubscriptionID
FROM [ReportServer].[dbo].[Catalog] CAT
JOIN [ReportServer].[dbo].[ReportSchedule] SCH
  ON CAT.ItemID = SCH.ReportID
WHERE CAT.Path= '/ReportPath/ReportName';
 

Removing duplicate rows

No need to over develop this – once you realise that the DELETE command can include the TOP option.

More? Ok. Create a SELECT command that shows the issue. EG:-

SELECT ID
FROM dbo._op2
WHERE ID = 'X123456';

Add TOP to return just the unwanted rows. IE: if the above query should return 1 row but erroneously returns 2 rows use TOP(1), if it returns 5 rows use TOP(4) …

(NOTE: It does not matter which particular rows get deleted … they are duplicates remember)

SELECT TOP(1) ID
FROM dbo._op2
WHERE ID = 'X123456';

Change SELECT to DELETE and remove the column name …

DELETE TOP(1)
FROM dbo._op2
WHERE ID = 'X123456';

… and only run it once 😉

Stop notification spam

Although I have previously tackled this problem by rolling my own notifications. This time I realised that the SQL jobs that run every minute or two are of low importance – and I don’t really want emails when they fail. I will notice at some point during the working day.

Here is the code to lists jobs that run frequently and are configured to send notification emails. Along with the commands to remove those notifications.

/* JobNotificationEmailStop.sql */

SELECT S.[name] JobName,
    SS.freq_subday_interval [ScheduleFreq(mins)],
    'EXEC msdb.dbo.sp_update_job @job_name = N''' 
    + S.[name] 
    + ''', @notify_email_operator_name = N'''';' 
    CommandToDisableEmailNotification

  FROM msdb.dbo.sysjobs S
  JOIN msdb.dbo.sysjobschedules SJ
    ON S.job_id = SJ.job_id
  JOIN msdb.dbo.sysschedules SS
    ON SS.schedule_id = SJ.schedule_id

 WHERE SS.freq_subday_interval > 0
   AND S.notify_level_email > 0
 ORDER BY SS.freq_subday_interval;
 

Except and Intersect

Here is the simplest working example of EXCEPT and INTERSECT I can come up with …

/* Except.sql */

IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
CREATE TABLE #t1 (c1 INT);
INSERT INTO #t1 VALUES (1), (2);

IF OBJECT_ID('tempdb..#t2') IS NOT NULL DROP TABLE #t2;
CREATE TABLE #t2 (c2 INT);
INSERT INTO #t2 VALUES (2), (3);

SELECT * FROM #t1
EXCEPT
SELECT * FROM #t2; /* = 1 */

SELECT * FROM #t1
INTERSECT
SELECT * FROM #t2; /* = 2 */

SELECT * FROM #t2
EXCEPT
SELECT * FROM #t1; /* = 3 */

SELECT * FROM #T1
EXCEPT
SELECT * FROM #T2
UNION
(SELECT * FROM #T2
EXCEPT
SELECT * FROM #T1); /* = 1 & 3 */

I use this frequently whilst refactoring to check the outputs are identical. And sometimes when syncing a MySQL table to MSSQL.

How to check SQL Jobs are actually doing something.

Looking through the Database Mail log today, I accidentally discovered a job that had been busy sending emails for I-don’t-know-how-long using an email profile that no longer worked. The output of the job was ‘success’ as the emails had been successfully queued with the Database Mail sub-system.

After finding the emails would have been empty anyway, I disabled the job. But it made me wonder if there might be other jobs that were busy doing nothing – hour after hour – day after day.

Knowing the dangers of weakening the system, I did not want to fail a job or job-step just to flag a maintenance issue.

The lowest-risk change I could think of making (to the many, legacy, unfamiliar jobs) was to leave pertinent messages in the job history log using the PRINT command. For example:-

IF EXISTS (SELECT 1 FROM SomeTable)
   BEGIN 
      PRINT 'YES: there is new data'
      (Do meaningful stuff)
   END
   ELSE
      PRINT 'NO: there is no new data';

Then in the future I might notice that there is Never any new data!

Capturing input parameters

Often when a stored-procedure is executed I want to know the parameters that were input. Which is handy for performance tuning.

There is a mechanism to automatically save input parameters with the cached execution plans, but quite often this does not work well.

On this occasion I embedded the facility right into the procedure as a temporary measure (please, don’t talk to me about triggers brrr).

CREATE PROCEDURE [dbo].[sp_SomeName]
 @ID UNIQUEIDENTIFIER = NULL,    
 @Record VARCHAR(50) = NULL    
AS  
BEGIN

 /* log parameters for performance tuning 1 of 2 */

  IF OBJECT_ID('[SomeDatabase].[dbo].[tbl_SomeTable]') IS NULL
  	SELECT GETDATE() STIME, GETDATE() ETIME, @ID ID, @Record RC 
  	INTO [dbo].[tbl_SomeTable]
  ELSE
  	INSERT INTO [dbo].[tbl_SomeTable]
  	SELECT GETDATE(), GETDATE(), @ID, @Record

 /* log parameters for performance tuning 1 of 2 */

...

Overkill really, but at the end of the procedure I added …

 ...

/* log parameters for performance tuning 2 of 2 */
  
  UPDATE [dbo].[tbl_SomeTable]
  SET ETIME = getdate()
  WHERE ETIME = STIME;
  
 /* log parameters for performance tuning 2 of 2 */

END
GO

Note: the real procedure had many more input parameters, and I suspected they are all set to null. Which would explain the poor performance.

Still, best to know what we’re optimizing for 🙂

Orphaned users

Servers have Logins, and databases have Users.

A Login and a User account are linked, sharing the same name and the same SID. Naturally, for each Login there can be many User accounts – one in each database.

Now, if you backup a database on one server and restore it onto another server. It may contain Users within that database, that do not have a corresponding Login on the second server.

Execute this command (against each user database) to list any orphaned Users

SELECT 	DP.type_desc,
        DP.SID,
        DP.[name] UserName
FROM [sys].[database_principals] DP
LEFT JOIN [sys].[server_principals] SP
       ON DP.SID = SP.SID
WHERE SP.SID IS NULL
AND DP.authentication_type_desc = 'INSTANCE';

To fix this you can either remove the orphaned user account or create a matching login. Removing a user account is tricky to script as you need to remove individual attributes first.

Creating a login depends on the type, SQL or Windows. For SQL logins paste the name, SID, and password* into this command and execute it.

USE [master]
GO
CREATE LOGIN [SomeLogin]
WITH PASSWORD = 'SomePassword',  
SID = 0xSomeSid;

(*If you do not know the SQL login password that was used on the old server then create a new one. They are not linked)

To create a Windows login use this command …

USE [master]
GO
CREATE LOGIN [SomeDomain\SomeLogin] 
FROM WINDOWS 
WITH DEFAULT_DATABASE = [master];

Set every users default schema to DBO where its blank

In this quick script I am assuming the Windows domain is called ‘DOM’ …

-- ChangeDefaultSchemaToDboWhereNull.sql

DECLARE @cmd varchar(1000) 

SET @cmd = 
'USE ? IF DB_ID(''?'') > 4 SELECT ''USE ?; ALTER USER ['' + name + ''] WITH DEFAULT_SCHEMA = [dbo]''
 FROM sys.database_principals
 WHERE default_schema_name IS NULL
 AND [name] LIKE ''DOM\%'''

IF OBJECT_ID('tempdb..#output') IS NOT NULL DROP TABLE #output
CREATE TABLE #output
(command varchar(1000))

INSERT INTO #output
EXEC sp_MSforeachdb @cmd

SELECT * 
FROM #output


SQL Snapshot worksheet

— snapshots.sql

–1. create a snapshot

USE master;
GO
CREATE DATABASE Credit_Snap
ON  
    (
    NAME = CreditData,
    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019\MSSQL\DATA\CreditData.ss'
    ),
    (
    NAME = CreditCatalog,
    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019\MSSQL\DATA\CreditCatalog.ss'
    )
AS SNAPSHOT OF Credit;

–2. restore database from a snapshot

USE master;
GO
DECLARE @kill VARCHAR(8000) = '';
SELECT @kill = @kill + 'kill ' + CONVERT(VARCHAR(5), spid) + ';'
  FROM master..sysprocesses
 WHERE dbid = DB_ID('Credit')
   AND spid > 50;
EXEC (@kill);
RESTORE DATABASE Credit FROM DATABASE_SNAPSHOT = 'Credit_Snap';

–3. delete snapshot

USE master;
GO
DROP DATABASE Credit_Snap;

— testing

SELECT *
FROM [Credit].[dbo].[member]
WHERE member_no = 22

BEGIN TRAN
UPDATE [Credit].[dbo].[member]
SET Firstname = 'DRY'
WHERE Firstname = 'CRRY'
ROLLBACK TRAN

SELECT aa.*
FROM [Credit].[dbo].[member] aa
JOIN [Credit_Snap].[dbo].[member] bb
  ON aa.member_no = bb.member_no 
WHERE aa.Firstname <> bb.Firstname;

Transactional Replication causing High CPU

In Publication Properties the setting “Subscriptions never expire …” has a surprising effect on the job “Distribution clean up: distribution”.

This job removes orphaned and replicated transactions from the Distribution database once the retention period has expired.

However, “Subscriptions never expire …” stops this procedure from removing orphaned transactions – left by a deleted subscription – or for any other reason.

This results in the Distribution database growing and high CPU.

To fix this, allow subscriptions to be able to expire.

So failed subscriptions may be deleted if not fixed within a year. This preserves the robustness of “Subscriptions never expire”, whilst allowing orphaned transactions to be cleaned up.

Change SQL Server Collation

To change the default collation of my SQL 2019 instance on my laptop to “SQL_Latin1_General_CP1_CI_AS” I …

– Found the saved the location of sqlservr.exe into notepad.

– Added [cd ] in front of the path (that’s cd and a space, without the square brackets)

– Added a second line in notepad [sqlservr -m -T4022 -T3659 -s”SQL2019″
-q”SQL_Latin1_General_CP1_CI_AS”] (without the square brackets)

– Stopped the SQL Server service

– Opened a Command Prompt as Administrator

– Executed the first command (cd …)

– Executed the second line (sqlservr …)

– Rebooted.

Dropping a user that owns a schema (Error: 15138)

Manually highlight and run #1. Paste the result into #2.

-- DropFailedForUser.sql

-- The statement (that caused the error)

   USE [master] -- in this case
   GO
   DROP USER [Dom\SomeUser]
   GO

/* The Error ...

   Drop failed for user 'Dom\SomeUser'
   The database principal owns a schema in the database
   and cannot be dropped. Error: 15138
   */

-- #1. find the name of the schema

   SELECT [name]
   FROM sys.schemas s
   WHERE s.principal_id = USER_ID('Dom\SomeUser');

-- #2. transfer ownership of the schema to 'dbo'

   ALTER AUTHORIZATION ON SCHEMA::[SomeSchemaName] TO dbo;

-- repeat "The Statement"

Redgate SQL Data Compare

I love this tool for refactoring. With a result set of over 3,000 rows across 60 columns, eyeballing similar outputs in a spreadsheet just would not do.

To use this tool, I modified the original query to output INTO a new table “_output” in the current database. Near the start of the query I put an IF EXISTS/DROP statement (more commonly used with temp tables), and at the bottom of the query I selected star from _output.

After improving the original query code (and saving it with a new name), I modified it similarly to the above – outputting results INTO table “_output” but in a DIFFERENT database.

I configured “SQL Data Compare” to use every column of the first “_output” table as a “comparison key”. And can now confirm the two tables called “_output” in different databases are identical.

Drop all tables that start with underscore

In an ironic twist of fate I adapted my ‘Drop all temp-tables’ script, to drop all tables beginning with an underscore.

Yes, it is ironic, because it uses a temp-table to store the working list of tables to be deleted. Whilst my original script used a real table to store a list of temp-tables.

Well … ok then

-- DropAllTablesStartingWithAnUnderscore.sql

IF OBJECT_ID('tempdb..#tables') IS NOT NULL DROP TABLE #tables
SELECT [name] 
INTO #tables
FROM sys.tables
WHERE [name] LIKE '/_%' ESCAPE '/'

DECLARE @table VARCHAR(200), @cmd VARCHAR(500)
WHILE (SELECT COUNT(*) FROM #tables) > 0
BEGIN
	SET @table = (SELECT TOP(1) [name] FROM #tables)
	SET @cmd = 'drop table ' + @table
	EXEC(@cmd)
	DELETE FROM #tables WHERE [name] = @table
END

(In case you were wondering why I created these underscore-tables in the first place. I was refactoring a stored-procedure that took over an hour to run, and had a large number of temp-tables. I wanted to persist those temp-tables for another day, and not have to start from scratch.)

Find the partner of a bracket in SSMS

Faced with a barrage of tabbed T-SQL in SSMS it can sometimes be quite difficult to see the close bracket that signifies – for example – the end of a CTE.

TIP: swipe from the inside of the bracket outwards to highlight (in grey) the bracket itself, and also its partner. EG: swipe right-to-left across an open bracket.

Running CHECKDB on TempDB

Normally I would not bother, but when CHECKDB runs on TempDB it cannot use a snapshot so has to lock all the temp tables. This script will wait for exclusive access for up to a minute.

DECLARE @outcome VARCHAR(50) = 'TempDB is currently too busy for CHECHDB', 
	@endtime DATETIME = DATEADD(mi,1,GETDATE())
WHILE GETDATE() < @endtime
BEGIN
  IF NOT EXISTS (SELECT 1 FROM sys.dm_tran_locks WHERE request_mode = 'X' AND resource_database_id = 2)
  BEGIN
    DBCC CheckDB([tempdb]) WITH NO_INFOMSGS;
    SET @outcome = 'success'
    BREAK;
  END
  WAITFOR DELAY '00:00:01';
END
SELECT @outcome

A TDE test restore

Post migration, I wanted to make sure an encrypted database on a new SQL 2014 Enterprise edition server could be restored.

I installed SQL 2014 Developer edition on a second machine and initially got the expected error …

Msg 33111, Level 16, State 3, Line 2
Cannot find server certificate with thumbprint.
Msg 3013, Level 16, State 3, Line 2
RESTORE DATABASE is terminating abnormally

1. I checked if the target server was enabled for TDE …

SELECT * FROM sys.symmetric_keys

This returned one row, I checked against the source server and that returned two rows. I concluded that the target server was NOT yet TDE enabled.

2. To enable TDE on the target …

USE Master
GO
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = '[SomePwIJustMadeUp]';
GO

3. Next I copied over the two files (*.cer and *.pvk) from the backup location to the target server and installed the certificate into Master …

USE Master
GO
CREATE CERTIFICATE [NameOfTheDatabase]_Cert2
FROM FILE = '[PathAndNameOfLocalCopyOfCertFile].cer'
WITH PRIVATE KEY (FILE = N'[PathAndNameOfLocalCopyOfPvkFile].pvk',
	  DECRYPTION BY PASSWORD ='[TheSourcePwIGotFromKeepAss]');
GO

4. After which I was able to restore the database as normal.

Searching every Procedures for a string.

To search every stored procedure in every database on every server (ranging from SQL Server 2005 to SQL Server 2016) for the string ‘QueryTraceOn’, I first registered every server within SSMS.

Right-clicking on the registered server folder, I chose ‘new query’ and ran ‘select 1’ to exclude from my list any server with issues.

Once I had an error free list, I ran this code (which took around 40 minutes) …

-- SearchProcs4String.sql

EXEC sp_MSforeachdb 'use ?
SELECT db_name() [Database], ROUTINE_SCHEMA + ''.'' 
+ ROUTINE_NAME [Proc]
FROM INFORMATION_SCHEMA.ROUTINES WITH (NOLOCK)
WHERE ROUTINE_DEFINITION LIKE ''%QUERYTRACEON%'';'

Getting away from Dedupe jobs

Duplicate data should ideally be stopped at the front end. However if a table already contains duplicate data you may want to bash out some code to clean it up. And schedule a SQL job to run the code regularly.

However, there are mechanisms baked right into SQL Server to manage this more efficiently (step away from scripting everything – devops 😉 )

True enough, you need to run code (once) to clean out all the current duplicates, but going forward a unique filtered index can keep them out.

For this particular project “duplicate data” meant that an CustID column should not contain a number already in that column if the Country was ‘GB’ and the PackageID was ‘5’.

Here is a simplified example of my solution …

-- ix_BlockDupCustIDs.sql

IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
CREATE TABLE #t1 (CustID INT, CountryCode CHAR(2), PackageID INT);

CREATE UNIQUE INDEX ix_BlockDupCustIDs 
ON #t1 (CustID) 
WHERE CountryCode = 'GB' AND PackageID = 5;

INSERT INTO #t1 VALUES (1, 'GB', 5) -- yes
INSERT INTO #t1 VALUES (2, 'GB', 5) -- yes
INSERT INTO #t1 VALUES (1, 'GB', 4) -- yes
INSERT INTO #t1 VALUES (1, 'US', 5) -- yes
--INSERT INTO #t1 VALUES (1, 'GB', 5) -- no, duplicate
--INSERT INTO #t1 VALUES (2, 'GB', 5), (3, 'IR', 1) -- no for both
--UPDATE #t1 SET PackageID = 5 WHERE PackageID = 4 -- nope

SELECT * FROM #t1;

Start SSMS as another user

There are a few ways to open SQL Server Managaement Studio as another Windows user. You can simply hold down the Shift key and right-click SSMS. Alternativly, you can create a shortcut on the desktop that uses the built in ‘Runas.exe’ command.

In notepad I assemble the 3 parts needed …

1. Full path to runas.exe
2. The Windows account I want to use
3. The full path to the SSMS executable.

Here is an example …

C:\Windows\System32\runas.exe /user:ZGROUP\rsmithadmin "C:\Program Files (x86)\Microsoft SQL Server Management Studio 18\Common7\IDE\Ssms.exe"

Log Shipping for Migration

The trouble with backing up databases in ‘old production’ then restoring them to ‘new production’ is that it takes time.

And there may be some unforeseen delay switching the front-end apps over.

Resulting in ‘old production’ being updated with new data, and ‘new production’ becoming out of date.

Log-shipping is an ideal, built-in, tool that can be used to keep ‘new production’ in sync with ‘old production’ during that phase between backup/restore and switching the front-end to ‘new production’.

This time around there was no need to script the setting up of log-shipping. There were only 13 databases, so using the GUI did not take long.

The idea is to complete the backup/restore a week or so before the switch-over and set-up log-shipping to keep the data in sync.

Then at the designated switch-over time, it takes only a moment to bring ‘new production’ on-line, as a fully up-to-date copy of ‘old production’.

Here is my crib-sheet …

 

Preparation

  • Primary and secondary servers should have as near as possible the same instance settings eg: max-memory, numa configuration, CLR, max dop, etc
  • Ensure user databases are using full recovery model
  • Create shared folder (on the target ideally)
  • Default backup compression is enabled (ideally)
  • Reduce VLF counts
  • Configure file share folder and connectivity
  • Disable tlog backups on primary

Preparation – Secondary Instance

  • Ensure enough space for databases
  • Matching drive letters for datafiles and logfiles (ideally)
  • Configure file share

Preperation – Monitor Instance

  • Ideally separate from primary and secondary

Security

  • Config login is a sys admin role
  • SQL server service account on primary needs read/write permission on backup directory (for the backup job)
  • SQL server service account on secondary needs read permission on backup share and read/write permission to secondary share (for copy / restore jobs)

Configuring log-shipping

  • Manually backup and restore database (with no recovery)
  • Use notepad to cut and paste connection strings and paths
  • Transfer logins, jobs and linked servers

The switch over

  • Manually execute the backup, copy, and restore jobs a final time
  • Manually restore each database “with recovery”
  • Detach old databases (so there is no chance of them being updated)
  • Point front-end-applications to new back-end server

Post Migration

  • Full backups (the old ones cannot be restored now)
  • Update all statistics
  • Check compatability level
  • Execute dbcc checkdb
  • Enable plan-store (read/write)
  • Monitor health and performance

Postcode search

The issue was that some postcodes were stored with spaces and some without.

This was further complicated by some user inputs (into the “Postcode Search” SSRS Report) had a space and some did not.

The root cause of the slow report was that the 90 MILLION stored postcode was being retrieved and manipulated (to remove spaces) before being compared with the ONE input.

--- OLD CODE -----------------

DECLARE @PostCode VARCHAR(8)
SELECT Forename,
       Surname,
       AccountNumber AS CustomerNo,
       AccountStartDate,
       AddressLine2 AS Address,
       PostCode,
       DateOfBirth
FROM [dbo].[SV_Customers]
WHERE (CountryCode = 'GB')
      AND (REPLACE(Postcode, ' ', '') = @PostCode);

My insight was to manipulate just the ONE input postcode before comparing it TWICE (with and without a space) to the un-manipulated postcodes stored in the database.

The first task then, was to split the input postcode into two parts. In all formats the last 3 characters were number, letter, letter.

So after saving the last part of the postcode separately, it was easy to deduce that the first part must be the whole thing minus the last part.

--- NEW CODE ------------------------

DECLARE @PostCode VARCHAR(8)
DECLARE @pc2 CHAR(3) = RIGHT(@PostCode, 3);
DECLARE @pc1 VARCHAR(4) = RTRIM(REPLACE(@PostCode, @pc2, ''));

SELECT Forename,
       Surname,
       AccountNumber AS CustomerNo,
       AccountStartDate,
       AddressLine2 AS Address,
       Postcode,
       DateOfBirth
FROM [dbo].[SV_Customers]
WHERE CountryCode = 'GB'
      AND (PostCode = @pc1 + @pc2         -- without space
        OR PostCode = @pc1 + ' ' + @pc2); -- or with space

The final task was to write the WHERE clause as simply as possible for long term maintenance. That’s the DBA in me 🙂

T-SQL Window Functions

“Window” sounds a bit like the singular of Microsoft’s Operating System, huh?

But no, imagine that each cell in a spreadsheet has two little glass “Windows”, one in the ceiling of its cell and one in the floor.

Then the occupant of cell C3 could look up at C2 and wave, or down at C4 and blow a raspberry.

But there’s more, C3 can now look up and down past C2 and C4 at ALL the values in the C column.

Now instead of cells in a spreadsheet imagine cells in a database table.

create table #t1 (c int)
insert into #t1 values (10), (20), (30), (40)

select * from #t1

select *,
    lag(c, 1) over(order by c) [Waving up],
    lead(c, 1) over(order by c) [Rasberrying down],
    SUM(c) OVER() [Sum of c]
from #t1

drop table #t1

WindowResults3

Stored Procedure Template

I always try to adopt the local standards. But where I’m setting one, here’s my Stored Procedure starting template …

-- NewProcTemplate.sql

USE DemoDW
GO

/* ========================================================================
Author:		Richard (RbS)
Date:		19 July 2019
Usage:		To list SalesPeople by Store. 
Example:	Exec [DemoDW].[dbo].[SPU_DimSalespersonGetByStore] @Store = '1'
Safe4Prod:	NO! {by default}
============================================================================ */

ALTER proc SPU_DimSalespersonGetByStore -- SPU_{Object}{Action}
               @Store NVARCHAR(50)
AS
BEGIN; SET NOCOUNT ON;

 SELECT StoreName, SalespersonName
 FROM [DemoDW].[dbo].[DimSalesperson]
 WHERE StoreName = @Store;

END
GO

NOTE: I do not develop within this template. To stay open minded I always start development from a simple select star statement. Then when that’s all good, its pasted in here, parameterized, tested, and adjusted (thanks Doug).

Migration with Log-Shipping

I had a requirement to script a repeatable SQL 2014 ent to SQL 2016 std migration. This was to be for up to 200 databases and therefore needed to be automated.

I chose to encapsulate a blend of TSQL and Powershell in a non-scheduled SQL Job. And as we were going UP a version but DOWN an edition, I felt log-shipping would be the best option.

I idea was to run the job a week or so before hand. Then at the time of the migration (a weekend), just 15 minutes of data (per database) would need to traverse the network.

The SETUP job had 9 steps :-
1. Create a control table
2. Backup (because you never know)
3. Decrypt
4. Move Logins and Fix Orphans
5. Shrink the log-file
6. Log-Ship: Initial Full backup to remote-server
7. Log-Shipping: Initial Restores on Remote in recovery mode.
8. Log-Shipping: Create BACKUP jobs locally
9. Log-Shipping: Create COPY and RESTORE jobs remotely.

Step-1

-- 1.ControlTable.sql

USE msdb;
GO

IF OBJECT_ID('[msdb].[dbo].[LSList]') IS NOT NULL
    DROP TABLE [msdb].[dbo].[LSList];
GO

CREATE TABLE [msdb].[dbo].[LSList] ([database] NVARCHAR(255) NOT NULL,
                                    backup_directory NVARCHAR(255) NOT NULL,
                                    backup_share NVARCHAR(255) NOT NULL,
                                    backup_destination_directory NVARCHAR(255) NOT NULL,
                                    pre_mig_backup INT NOT NULL,
                                    is_encrypted INT NULL,
                                    LS_backup INT NULL,
                                    start_time_offset INT NOT NULL);

INSERT INTO [msdb].[dbo].[LSList] ([database],
                                   backup_directory,
                                   backup_share,
                                   backup_destination_directory,
                                   pre_mig_backup,
                                   start_time_offset)
VALUES (N'DatabaseName1', N'h:\shipping', N'\\LocalServerName\shipping', N'\\RemoteServerName\shipping', 0, 2);

INSERT INTO [msdb].[dbo].[LSList] ([database],
                                   backup_directory,
                                   backup_share,
                                   backup_destination_directory,
                                   pre_mig_backup,
                                   start_time_offset)
VALUES (N'DatabaseName2', N'h:\shipping', N'\\LocalServerName\shipping', N'\\RemoteServerName\shipping',0, 4);

-- populate encryption flag

UPDATE [msdb].[dbo].[LSList]
    SET is_encrypted = 1 -- yes
    WHERE [database] IN (SELECT db.[name]
                                          FROM sys.databases db
                                          JOIN sys.dm_database_encryption_keys dm
                                          ON db.database_id = dm.database_id );

-- select * FROM [msdb].[dbo].[LSList]

Step-2

-- 2.PreMigBackups.sql

-- select * from [msdb].[dbo].[LSList]
-- update [msdb].[dbo].[LSList] SET pre_mig_backup = 0

DECLARE @Query  NVARCHAR(MAX),
        @dbname VARCHAR(200);

WHILE (SELECT COUNT(*) FROM [msdb].[dbo].[LSList] WHERE pre_mig_backup = 0) > 0
BEGIN
    SET @dbname = (   SELECT TOP 1 [database]
                        FROM [msdb].[dbo].[LSList]
                       WHERE pre_mig_backup = 0);

    SET @Query = N'BACKUP DATABASE [' + @dbname + '] 
	TO  DISK = N''H:\SQL Backup\' + @dbname + '_' + replace(convert(varchar(16), getdate(),126), ':','') + '.bak'' 
	WITH COPY_ONLY, NOFORMAT, INIT,  STATS = 10';

    EXEC sp_executesql @Query;

    UPDATE [msdb].[dbo].[LSList]
    SET pre_mig_backup = 1
    WHERE [database] = @dbname;
END;

Step-3

-- 3.decrypt.sql

DECLARE @Query  NVARCHAR(MAX), @dbname VARCHAR(200);

/* 
 is_encrypted 
 null = no
 1 = yes
 0 = not any more
*/

WHILE (SELECT COUNT(*) FROM [msdb].[dbo].[LSList] WHERE is_encrypted = 1) > 0
BEGIN

    SET @dbname = (SELECT TOP 1 [database] FROM [msdb].[dbo].[LSList] WHERE is_encrypted = 1);

  /* 1 set encryption off */

    SET @Query = N'ALTER DATABASE [' + @dbname + N'] SET ENCRYPTION OFF;';
    EXEC sp_executesql @Query;

  /* 2 pause until decrypted */

    WHILE (  SELECT dm.encryption_state
                    FROM sys.databases db
                    LEFT JOIN sys.dm_database_encryption_keys dm
                       ON db.database_id = dm.database_id
                    WHERE [name] = @dbname)  1
    BEGIN
        WAITFOR DELAY '00:00:10';
    END;

  /*3 drop key */

    SET @Query = 'USE [' + @dbname + ']; DROP DATABASE ENCRYPTION KEY';
    EXEC sp_executesql @Query;

  /* 4 log changes then move on */

    UPDATE [msdb].[dbo].[LSList]
       SET is_encrypted = 0
     WHERE [database] = @dbname;

END;

-- Stop MLB

-- DECLARE @Query  NVARCHAR(MAX),
--        @dbname VARCHAR(200);

IF OBJECT_ID('tempdb..#tlist') IS NOT NULL
    DROP TABLE #tlist;
SELECT [database]
  INTO #tlist
  FROM [msdb].[dbo].[LSList];

WHILE (SELECT COUNT(*) FROM #tlist) > 0
BEGIN

    SET @dbname = (SELECT TOP 1 [database] FROM #tlist);

    SET @Query = N'EXEC [msdb].[smart_admin].[sp_set_db_backup]
					@database_name = [' + @dbname + N'],
					@enable_backup = 0'; -- off

    EXEC sp_executesql @Query;

    DELETE FROM #tlist
     WHERE [database] = @dbname;

END;

Step-4

Powershell.exe "Export-DbaLogin -SqlInstance LocalServerName -Append  -Path C:\temp\LocalServerName-logins.sql"

Powershell.exe "Export-DbaUser -SqlInstance LocalServerName -Append  -Path C:\temp\LocalServerName-users.sql"

Powershell.exe "Copy-DbaLogin -Source LocalServerName -Destination RemoteServerName -ExcludeSystemLogins"

Step-5

Powershell.exe "Repair-DbaOrphanUser -SqlInstance RemoteServerName"

Managed Backups

Managed Backups were a great new feature with SQL 2014 and above. They allow backups to the cloud and are managed from within SSMS.

There is a GUI but its just for initialization. Configuration all happens through TSQL. Here is my work sheet …

-- managedBackups.sql

-- view server config

USE msdb;
SELECT * FROM smart_admin.fn_backup_instance_config();

-- view server config details

USE msdb;SELECT db_name, is_managed_backup_enabled, retention_days, storage_url, encryption_algorithm
FROM smart_admin.fn_backup_db_config(NULL)

-- disable individual log backup

USE msdb;
EXEC smart_admin.sp_set_db_backup
@database_name = [DISC_Green_Abbey_8230003_test],
@enable_backup = 0;

Caching result sets

(For Sam) I wanted to performance tune a stored-procedure that was just one big SELECT statement (used to return all current Orders).

The code was just about as optimum as it could get, and returned around 8,000 rows each time, taking about 35 seconds to do so.

I saved the output over a few consecutive days and noticed (crucially) that most of the rows were the same each day.

My big-idea then, was to pre-cache (and pre-format) the results on “Day One”, and just append new rows to that going forward.

The final working stored-procedure contained 5 labeled areas:-

 - (1. Create and fill a cache-table if there isn't one)
 - 2. Save a thin version of the current data to a temp-table
 - 3. Add only NEW data to the cache-table
 - 4. Remove DELETED data from the cache-table
 - 5. Output the cache-table

1. If the cache-table didn’t exist, run the original query, but saving INTO a cache-table. Mostly this step was not executed, but I wanted the stored-procedure to be complete.

There was a DateTime column in the results set that was guaranteed to be unique. I made this the primary-key of the cache-table.

2. In a separate window, I stripped back the original query until just the DateTime column was returned. Unnecessarily, I added code to the top to delete any temp-table called “#thin” if it already existed (my habit). Then I added code to save the stripped back query results INTO a temp-table … called “#thin”.

This step would run every time, and the output could be compared with the old data (in the cache-table) to add any new rows, and knock off any old ones.

3. The original query was executed but with a WHERE clause added, like WHERE prod.DateTime not in (SELECT DateTime FROM #thin). The 2 or 3 (fat) rows returned from this step were appended to the cache-table.

4. A simple DELETE removed any rows from the cache-table where the DateTime was not in the #thin table.

5. The Cache-table was SELECT’ed in full as the stored-procedures output. Which typically ran in around 7 seconds. Despite the extra round-trip to the database.

Testing. After a day or two compare the old / new result sets in spreadsheet tabs and adjust indexing accordingly (As always, full responsibility lies with the implementer).

Addendum. To help performance I later changed Step-3 from …

WHERE prod.DateTime not in (SELECT DateTime FROM #thin)

… to …

LEFT JOIN cache.table cac ON cac.DateTime = prod.DateTime
WHERE cac.DateTime IS NULL

ORDER BY CASE

In TSQL I recently discovered how to use the CASE command in the ORDER BY clause to sort results in custom ways.

For example, to order countries with the UK and USA at the top then the rest alphabetically would in the past have caused me to either generate a calculated ‘CountrySort’ column or UNION two queries.

Now I can do this …

ORDER BY CASE
		WHEN countryid = 1 THEN 'AAA'
		WHEN countryid = 23 THEN 'AAB'
		ELSE countryname END

Which translates as …

‘Order by countryname
having first replaced the countryname with ‘AAA’ where the countryid is 1
and ‘AAB’ where its 23′.

Here are the results (including countryid for clarity)…

countrysort

Column Max Length

From my “Spreadsheet sizer” script, this one helped me move sensibly away from pesky varchar(max) columns.

-- ColumnMaxLength.sql

DECLARE @TableName VARCHAR(255) = 'customers' --<< input
DECLARE @SchemaName VARCHAR(255) = 'dbo' 
DECLARE @sqlcmd varchar(max) 

select @sqlcmd = stuff((SELECT ' union all
select ' 
+ QUOTENAME(table_schema,'''') + ' [Schema], ' 
+ QUOTENAME(TABLE_NAME,'''') + ' [Table], ' 
+ quotename(column_name,'''') + ' [Column],
max(datalength(' + quotename(column_name) + ')) MaxLength 
from ' + quotename(table_schema) + '.' + quotename(table_name)
from information_schema.columns
where 1=1
AND table_name =  @TableName
AND table_schema = @SchemaName
order by column_name
for xml path(''),type).value('.','varchar(max)'),1,11,'')

exec(@sqlcmd)

Comparing Stored-Procedures

Had a bit of a problem today with the re-write project.

I had been checking new stored-procedures in the DEV database, and (if good) pasting them into the WEB database.

The issue was that some DEV stored-procedures that I had already checked-in to WEB had been modified again.

Rather than trying to enforce version-control (mmm), or download Redgate’s SQL Compare, I modified my ‘Whats New” routine to compare the modify-dates between the DEV and WEB databases.

-- CompareSP.sql

SELECT [dev].[type_desc],
       (SELECT [name] FROM [companydev].[sys].[schemas] WHERE [schema_id] = [dev].[schema_id]) [schema],
       CASE [dev].[parent_object_id]
           WHEN '0' THEN [dev].[name]
           ELSE OBJECT_NAME([dev].[parent_object_id]) + '.' + [dev].[name]
       END [object_name],
       [dev].[create_date],
       [dev].[modify_date], -- or create-date if there isn't one
	   '' v,
	   [web].[modify_date] web_modify_date , 
	   DATEDIFF(MINUTE, [dev].[modify_date], [web].[modify_date]) mod_diff
FROM [companydev].[sys].[objects] dev
JOIN [companyweb].[sys].[objects] web
  ON [dev].[name] = [web].[name]
WHERE [dev].[is_ms_shipped] = 0 -- exclude system-objects
AND [dev].[type] = 'P' -- just stored-procedures
--AND [dev].[modify_date] > '21 nov 2018'
ORDER BY [dev].[modify_date] DESC;

Adding a NOT NULL column to an existing table

-- AddingNotNullColumnToExistingTable.sql

-- 1. Add new column to the old table, as NULL for now

	ALTER TABLE [dbo].[TableName] 
	ADD [ColumnName] INT NULL

-- 2. Set the default to zero for new rows

	ALTER TABLE [dbo].[TableName] 
	ADD CONSTRAINT [DF_TableName_ColumnName] 
	DEFAULT(0) FOR [ColumnName]

-- 3. Change all existing null values to zeros

	UPDATE [dbo].[TableName] 
	SET [ColumnName] = 0 
	WHERE [ColumnName] IS NULL

-- 4. Change column from NULL to NOT NULL

	ALTER TABLE [dbo].[TableName] 
	ALTER COLUMN [ColumnName] INT NOT NULL

-- Undo (while testing)

	ALTER TABLE [dbo].[TableName] 
	DROP CONSTRAINT [DF_TableName_ColumnName]

	ALTER TABLE [dbo].[TableName] 
	DROP COLUMN [ColumnName]

Calendar UK

Must be that time of year again :). Adapted from Aaron’s beautiful US calendar script …

-- CalendarUK.sql
use [Dev];

-- initialize period

	DECLARE @StartDate DATE = '20000101', @NumberOfYears INT = 30;

-- prevent set or regional settings from interfering with 
-- interpretation of dates / literals

	SET DATEFIRST 7; -- sunday is the first day of week
	SET DATEFORMAT mdy; -- thats month/day/year
	SET LANGUAGE US_ENGLISH;

	DECLARE @CutoffDate DATE = DATEADD(YEAR, @NumberOfYears, @StartDate);

-- 1. this is just a holding table for intermediate calculations:

	IF OBJECT_ID('tempdb..#cal') IS NOT NULL DROP TABLE #cal
	CREATE TABLE #cal
	(
	  [date]       DATE PRIMARY KEY, 
	  [day]        AS DATEPART(DAY,      [date]),
	  [month]      AS DATEPART(MONTH,    [date]),
	  FirstOfMonth AS CONVERT(DATE, DATEADD(MONTH, DATEDIFF(MONTH, 0, [date]), 0)),
	  [MonthName]  AS DATENAME(MONTH,    [date]),
	  [week]       AS DATEPART(WEEK,     [date]),
	  [ISOweek]    AS DATEPART(ISO_WEEK, [date]),
	  [DayOfWeek]  AS DATEPART(WEEKDAY,  [date]),
	  [quarter]    AS DATEPART(QUARTER,  [date]),
	  [year]       AS DATEPART(YEAR,     [date]),
	  FirstOfYear  AS CONVERT(DATE, DATEADD(YEAR,  DATEDIFF(YEAR,  0, [date]), 0)),
	  Style112     AS CONVERT(CHAR(8),   [date], 112),
	  Style101     AS CONVERT(CHAR(10),  [date], 101)
	);

-- use the catalog views to generate as many rows as we need

	INSERT #cal([date]) 
	SELECT d
	FROM
	(
	  SELECT d = DATEADD(DAY, rn - 1, @StartDate)
	  FROM 
	  (
		SELECT TOP (DATEDIFF(DAY, @StartDate, @CutoffDate)) 
		  rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
		FROM sys.all_objects AS s1
		CROSS JOIN sys.all_objects AS s2
		ORDER BY s1.[object_id]
	  ) AS x
	) AS y;

-- 2. create the real table

	IF OBJECT_ID('dbo.CalendarUK') IS NOT NULL DROP TABLE dbo.CalendarUK
	CREATE TABLE [dbo].[CalendarUK]
	(
	  DateKey             INT         NOT NULL PRIMARY KEY,
	  [Date]              DATE        NOT NULL,
	  [Day]               TINYINT     NOT NULL,
	  DaySuffix           CHAR(2)     NOT NULL,
	  [Weekday]           TINYINT     NOT NULL,
	  WeekDayName         VARCHAR(10) NOT NULL,
	  IsWeekend           BIT         NOT NULL,
	  IsHoliday           BIT         NOT NULL,
	  HolidayText         VARCHAR(64) SPARSE,
	  DOWInMonth          TINYINT     NOT NULL,
	  [DayOfYear]         SMALLINT    NOT NULL,
	  WeekOfMonth         TINYINT     NOT NULL,
	  WeekOfYear          TINYINT     NOT NULL,
	  ISOWeekOfYear       TINYINT     NOT NULL,
	  [Month]             TINYINT     NOT NULL,
	  [MonthName]         VARCHAR(10) NOT NULL,
	  [Quarter]           TINYINT     NOT NULL,
	  QuarterName         VARCHAR(6)  NOT NULL,
	  [Year]              INT         NOT NULL,
	  MMYYYY              CHAR(6)     NOT NULL,
	  MonthYear           CHAR(7)     NOT NULL,
	  FirstDayOfMonth     DATE        NOT NULL,
	  LastDayOfMonth      DATE        NOT NULL,
	  FirstDayOfQuarter   DATE        NOT NULL,
	  LastDayOfQuarter    DATE        NOT NULL,
	  FirstDayOfYear      DATE        NOT NULL,
	  LastDayOfYear       DATE        NOT NULL,
	  FirstDayOfNextMonth DATE        NOT NULL,
	  FirstDayOfNextYear  DATE        NOT NULL
	);
	GO

-- 3 populate the real table from the temp table

	INSERT dbo.CalendarUK WITH (TABLOCKX)
	SELECT
	  DateKey     = CONVERT(INT, Style112),
	  [Date]        = [date],
	  [Day]         = CONVERT(TINYINT, [day]),
	  DaySuffix     = CONVERT(CHAR(2), CASE WHEN [day] / 10 = 1 THEN 'th' ELSE 
					  CASE RIGHT([day], 1) WHEN '1' THEN 'st' WHEN '2' THEN 'nd' 
					  WHEN '3' THEN 'rd' ELSE 'th' END END),
	  [Weekday]     = CONVERT(TINYINT, [DayOfWeek]),
	  [WeekDayName] = CONVERT(VARCHAR(10), DATENAME(WEEKDAY, [date])),
	  [IsWeekend]   = CONVERT(BIT, CASE WHEN [DayOfWeek] IN (1,7) THEN 1 ELSE 0 END),
	  [IsHoliday]   = CONVERT(BIT, 0),
	  HolidayText   = CONVERT(VARCHAR(64), NULL),
	  [DOWInMonth]  = CONVERT(TINYINT, ROW_NUMBER() OVER 
					  (PARTITION BY FirstOfMonth, [DayOfWeek] ORDER BY [date])),
	  [DayOfYear]   = CONVERT(SMALLINT, DATEPART(DAYOFYEAR, [date])),
	  WeekOfMonth   = CONVERT(TINYINT, DENSE_RANK() OVER 
					  (PARTITION BY [year], [month] ORDER BY [week])),
	  WeekOfYear    = CONVERT(TINYINT, [week]),
	  ISOWeekOfYear = CONVERT(TINYINT, ISOWeek),
	  [Month]       = CONVERT(TINYINT, [month]),
	  [MonthName]   = CONVERT(VARCHAR(10), [MonthName]),
	  [Quarter]     = CONVERT(TINYINT, [quarter]),
	  QuarterName   = CONVERT(VARCHAR(6), CASE [quarter] WHEN 1 THEN 'First' 
					  WHEN 2 THEN 'Second' WHEN 3 THEN 'Third' WHEN 4 THEN 'Fourth' END), 
	  [Year]        = [year],
	  MMYYYY        = CONVERT(CHAR(6), LEFT(Style101, 2)    + LEFT(Style112, 4)),
	  MonthYear     = CONVERT(CHAR(7), LEFT([MonthName], 3) + LEFT(Style112, 4)),
	  FirstDayOfMonth     = FirstOfMonth,
	  LastDayOfMonth      = MAX([date]) OVER (PARTITION BY [year], [month]),
	  FirstDayOfQuarter   = MIN([date]) OVER (PARTITION BY [year], [quarter]),
	  LastDayOfQuarter    = MAX([date]) OVER (PARTITION BY [year], [quarter]),
	  FirstDayOfYear      = FirstOfYear,
	  LastDayOfYear       = MAX([date]) OVER (PARTITION BY [year]),
	  FirstDayOfNextMonth = DATEADD(MONTH, 1, FirstOfMonth),
	  FirstDayOfNextYear  = DATEADD(YEAR,  1, FirstOfYear)
	FROM #cal
	OPTION (MAXDOP 1);

-- 4 add holidays

	;WITH x AS 
	(
	  SELECT DateKey, [Date], IsHoliday, HolidayText, FirstDayOfYear,
		DOWInMonth, [MonthName], [WeekDayName], [Day],
		LastDOWInMonth = ROW_NUMBER() OVER 
		(
		  PARTITION BY FirstDayOfMonth, [Weekday] 
		  ORDER BY [Date] DESC
		)
	  FROM dbo.CalendarUK
	)
	UPDATE x SET IsHoliday = 1, HolidayText = CASE
	  WHEN ([Date] = FirstDayOfYear) THEN 'New Years Day'
	  WHEN ([DOWInMonth] = 3 AND [MonthName] = 'April' AND [WeekDayName] = 'Friday') THEN 'Good Friday'                  -- (3rd Monday in January)
	  WHEN ([DOWInMonth] = 1 AND [MonthName] = 'May' AND [WeekDayName] = 'Monday') THEN 'May Day'                        -- (first Monday in May)
	  WHEN ([LastDOWInMonth] = 1 AND [MonthName] = 'May' AND [WeekDayName] = 'Monday') THEN 'May Bank Holiday'           -- (last Monday in May)
	  WHEN ([LastDOWInMonth] = 1 AND [MonthName] = 'August' AND [WeekDayName] = 'Monday') THEN 'August Bank Hoiliday'    -- (last Monday in August)
	  WHEN ([MonthName] = 'December' AND [Day] = 25) THEN 'Christmas Day'
	  WHEN ([MonthName] = 'December' AND [Day] = 26) THEN 'Boxing Day'
	  END
	WHERE -- IsHoliday
	  ([Date] = FirstDayOfYear)
	  OR ([LastDOWInMonth] = 1 AND [MonthName] = 'May' AND [WeekDayName] = 'Monday')
	  OR ([DOWInMonth] = 1     AND [MonthName] = 'May' AND [WeekDayName] = 'Monday')
	  OR ([LastDOWInMonth] = 1 AND [MonthName] = 'August'    AND [WeekDayName] = 'Monday')
	  OR ([MonthName] = 'December' AND [Day] = 25)
	  OR ([MonthName] = 'December' AND [Day] = 26);


-- 5. create a function to calculate easter etc

	IF OBJECT_ID('dbo.GetEasterHolidays') IS NOT NULL DROP FUNCTION dbo.GetEasterHolidays
	GO

	CREATE FUNCTION dbo.GetEasterHolidays(@year INT) 
	RETURNS TABLE
	WITH SCHEMABINDING
	AS 
	RETURN 
	(
	  WITH x AS 
	  (
		SELECT [Date] = CONVERT(DATE, RTRIM(@year) + '0' + RTRIM([Month]) 
			+ RIGHT('0' + RTRIM([Day]),2))
		  FROM (SELECT [Month], [Day] = DaysToSunday + 28 - (31 * ([Month] / 4))
		  FROM (SELECT [Month] = 3 + (DaysToSunday + 40) / 44, DaysToSunday
		  FROM (SELECT DaysToSunday = paschal - ((@year + @year / 4 + paschal - 13) % 7)
		  FROM (SELECT paschal = epact - (epact / 28)
		  FROM (SELECT epact = (24 + 19 * (@year % 19)) % 30) 
			AS epact) AS paschal) AS dts) AS m) AS d
	  )
	  SELECT DATEADD(DAY,-2,[Date]) [Date], 'Good Friday' HolidayName FROM x
		UNION ALL SELECT DATEADD(DAY, 1,[Date]), 'Easter Monday' FROM x
	  );
	GO

-- 6. use the function to insert easter etc

	;WITH x AS 
	(
	  SELECT d.[Date], d.IsHoliday, d.HolidayText, h.HolidayName
		FROM dbo.CalendarUK AS d
		CROSS APPLY dbo.GetEasterHolidays(d.[Year]) AS h
		WHERE d.[Date] = h.[Date]
	)
	UPDATE x SET IsHoliday = 1, HolidayText = HolidayName;

-- 7. show results

	SELECT * 
	FROM dbo.CalendarUK
	WHERE [year] = '2019'
	--WHERE [year] in ('2019', '2020')
	AND (IsHoliday = 1
	OR HolidayText IS NOT NULL)
	--and DateKey = '20181231'

RCSI testing

Here’s some code to create a large number of ghost records.

--rcsi_testing.sql

-- create and populate a test table

CREATE TABLE dbo.demo_table
  (
      ID    INT       NOT NULL    IDENTITY (1, 1),
      C1    CHAR(100) NOT NULL
  );
  GO
   
  INSERT INTO dbo.demo_table (C1)
  SELECT TOP (1000)
         CAST(TEXT AS CHAR(100)) AS C1
  FROM   sys.messages
  WHERE  language_id = 1031;
  GO
    
  CREATE UNIQUE CLUSTERED INDEX cuix_demo_table_Id
  ON dbo.demo_table (Id);
  GO

 
-- start a 1 minute workload
 
  SET NOCOUNT ON;
  GO
  BEGIN TRANSACTION; ---------**********KEY
  GO
  	-- Insert new record into dbo.demo_table
  	DECLARE	@finish_date DATETIME2(0) = DATEADD(MINUTE, 1, GETDATE());
  	WHILE @finish_date >= GETDATE()
  	BEGIN
  		-- wait 10 ms before each new process
  		INSERT INTO dbo.demo_table(C1)
  		SELECT C1
  		FROM   dbo.demo_table
  		WHERE  Id = (SELECT MIN(Id) FROM dbo.demo_table);
    
  		-- Wait 10 ms to delete the first record from the table
  		WAITFOR DELAY '00:00:00:010';
    
  		-- Now select the min record from the table
  		DELETE dbo.demo_table WHERE Id = (SELECT MIN(Id) FROM dbo.demo_table);
  	END
  ROLLBACK TRAN;
  GO

Monitoring RCSI

I created a sql-job to run every 10 minutes to a) save the current ghost count, and b) email me if its a new high!

Step-1 create the table

CREATE TABLE [maint_db].[dbo].[rcsi_monitor] 
	(
	date_time DATETIME, 
	table_name VARCHAR(50), 
	ghost_records BIGINT
	);

If this step succeeded the job would end there. If the step failed (-say- because the table already existed) the job would continue to step-2

Step-2 save the current counts to the table

INSERT INTO [maint_db].[dbo].[rcsi_monitor]

SELECT	GETDATE(),
	OBJECT_NAME(object_id),
	version_ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), null, null, null, 'sampled')
WHERE version_ghost_record_count > 0;

The SELECT statement above is the only novel thing here, and perhaps the most useful take-away. (Note: the DB_ID() means the current database, so ensure it runs under the right one).

Step-3 send an alert – if the current count is the new HIGH SCORE!

IF 
	(SELECT MAX(version_ghost_record_count) FROM sys.dm_db_index_physical_stats(DB_ID(), null, null, null, 'sampled'))
>=
	(SELECT ISNULL(MAX(ghost_records), 0) FROM [maint_db].[dbo].[rcsi_monitor])
AND
	(SELECT ISNULL(MAX(ghost_records), 0) FROM [maint_db].[dbo].[rcsi_monitor]) > 0
BEGIN
	RAISERROR ('Too many Ghost! AAAAAaaarrrrrrggggghh!', 16, 1)
	RETURN
END

The RAISERROR and RETURN would force the job to fail, triggering an email via Notifications.

 

Fix sp_BlitzLock

I notice whenever there is corruption in a single extended events deadlock report …

Capture
… sp_BlitzLock would not work at all …

Msg 9411, Level 16, State 1, Procedure sp_BlitzLock, Line 185 [Batch Start Line 12]
XML parsing: line 37, character 166, semicolon expected

My work-around was to replace line 196 …

AS ( SELECT CONVERT(XML, event_data) AS deadlock_xml

… with this …

AS ( SELECT CONVERT(XML, REPLACE(event_data,'&',';')) AS deadlock_xml