Before we go off and start running delete scripts on our data, it is always a good idea to make sure you have a backup of the table so you can check you work after or restore if something goes wrong.
Let's first look at the data
-- see what rows have duplicates and which ones don'tSELECT EmailAddress
, row_number() OVER ( PARTITION BY EmailAddress ORDER BY ID) AS NumInstances
FROM Person
-- show only the rows that are duplicates
with myTable as (
SELECT EmailAddress
, row_number() OVER ( PARTITION BY EmailAddress ORDER BY ID) AS NumInstances
FROM Person
)
select * FROM myTable
WHERE NumInstances > 1
The actual delete statement
-- do the actual deleting of the duplicate rowswith myTable as (
SELECT EmailAddress
, row_number() OVER ( PARTITION BY EmailAddress ORDER BY ID) AS NumInstances
FROM Person
)
DELETE FROM myTable
WHERE NumInstances > 1
To modify the code to fit your situation you will typically just need to replace the items in red with the comma separated list of columns you want to consider. Next change the table name in blue to the table you are trying to remove the duplicates from. Finally, change the column in orange to be one or more columns (separated by comma). In this case, I have a primary key so, I used it. You could use all the columns you are considering as well.
For more info on how it works and pre-SQL 2005 solutions see here.
No comments:
Post a Comment