Getting away from Dedupe jobs

Duplicate data should ideally be stopped at the front end. However if a table already contains duplicate data you may want to bash out some code to clean it up. And schedule a SQL job to run the code regularly.

However, there are mechanisms baked right into SQL Server to manage this more efficiently (step away from scripting everything – devops 😉 )

True enough, you need to run code (once) to clean out all the current duplicates, but going forward a unique filtered index can keep them out.

For this particular project “duplicate data” meant that an CustID column should not contain a number already in that column if the Country was ‘GB’ and the PackageID was ‘5’.

Here is a simplified example of my solution …

-- ix_BlockDupCustIDs.sql

IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
CREATE TABLE #t1 (CustID INT, CountryCode CHAR(2), PackageID INT);

CREATE UNIQUE INDEX ix_BlockDupCustIDs 
ON #t1 (CustID) 
WHERE CountryCode = 'GB' AND PackageID = 5;

INSERT INTO #t1 VALUES (1, 'GB', 5) -- yes
INSERT INTO #t1 VALUES (2, 'GB', 5) -- yes
INSERT INTO #t1 VALUES (1, 'GB', 4) -- yes
INSERT INTO #t1 VALUES (1, 'US', 5) -- yes
--INSERT INTO #t1 VALUES (1, 'GB', 5) -- no, duplicate
--INSERT INTO #t1 VALUES (2, 'GB', 5), (3, 'IR', 1) -- no for both
--UPDATE #t1 SET PackageID = 5 WHERE PackageID = 4 -- nope

SELECT * FROM #t1;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s