Duplicate data should ideally be stopped at the front end. However if a table already contains duplicate data you may want to bash out some code to clean it up. And schedule a SQL job to run the code regularly.
However, there are mechanisms baked right into SQL Server to manage this more efficiently (step away from scripting everything – devops 😉 )
True enough, you need to run code (once) to clean out all the current duplicates, but going forward a unique filtered index can keep them out.
For this particular project “duplicate data” meant that an ID column should not contain a number already in that column if the Country was ‘UK’ and the Package ID was ‘5’.
Here is a simplified example of my solution …
-- ix_BlockDupCustIDs.sql IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1; CREATE TABLE #t1 (CustID INT, CountryCode CHAR(2), PackageID INT); CREATE UNIQUE INDEX ix_BlockDupCustIDs ON #t1 (CustID) WHERE CountryCode = 'GB' AND PackageID = 5; INSERT INTO #t1 VALUES (1, 'GB', 5) -- yes INSERT INTO #t1 VALUES (2, 'GB', 5) -- yes INSERT INTO #t1 VALUES (1, 'GB', 4) -- yes INSERT INTO #t1 VALUES (1, 'US', 5) -- yes --INSERT INTO #t1 VALUES (1, 'GB', 5) -- no, duplicate --INSERT INTO #t1 VALUES (2, 'GB', 5), (3, 'IR', 1) -- no for both --UPDATE #t1 SET PackageID = 5 WHERE PackageID = 4 -- nope SELECT * FROM #t1;