Removig Duplicates Using MySQL 1.0

October 17, 2003 - updated February 8, 2006

You can install the Avoiding Duplicate Comment and Avoiding Duplicate Trackback Pings hacks to protect your blog against duplicates. However, getting rid of existing duplicates takes a little more work. If your blog uses MySQL, you can use the following recipes to get rid of existing duplicates in a couple of simple steps.

The Idea

A duplicate comment or ping is a database entry the contains (pretty much) the same information as a previous entry in the database. You can use the following SQL to find the IDs of duplicated pings:
SELECT DISTINCT t2.tbping_id
FROM mt_tbping AS t1 JOIN mt_tbping AS t2 ON
    t1.tbping_source_url = t2.tbping_source_url &&
    t1.tbping_tb_id = t2.tbping_tb_id
WHERE t1.tbping_id < t2.tbping_id
Unfortunately older versions of MySQL don’t make it easy to automatically delete the records identified in this query. So instead of listing the IDs, we can construct a list of DELETE commands that we then execute in a separate step.

Deleting Duplicate Trackback Pings

In MySQL (either from the console or using phpMyAdmin) execute the following query on your MovableType database:
SELECT DISTINCT CONCAT('DELETE FROM mt_tbping WHERE tbping_id = ', "'", t2.tbping_id, "'", ';')
FROM mt_tbping AS t1 JOIN mt_tbping as t2 ON
    t1.tbping_source_url = t2.tbping_source_url &&
    t1.tbping_tb_id = t2.tbping_tb_id
WHERE t1.tbping_id < t2.tbping_id
This will produce a listing of DELETE commands, looking like this:
DELETE FROM mt_tbping WHERE tbping_id = '23'; 
DELETE FROM mt_tbping WHERE tbping_id = '24'; 
DELETE FROM mt_tbping WHERE tbping_id = '25'; 
DELETE FROM mt_tbping WHERE tbping_id = '40'; 
Take the listing (for example by copying it in the browser and pasting it into a SQL query field in phpMyAdmin) and execute it.

Deleting Duplicate Comments

In MySQL (either from the console or using phpMyAdmin) execute the following query on your MovableType database:
SELECT DISTINCT CONCAT('DELETE FROM mt_comment WHERE comment_id = ', "'", t2.comment_id, "'", ';')
FROM mt_comment AS t1 JOIN mt_comment AS t2 ON
    t1.comment_entry_id = t2.comment_entry_id &&
    t1.comment_author = t2.comment_author &&
    t1.comment_email = t2.comment_email &&
    t1.comment_text = t2.comment_text &&
    t1.comment_url = t2.comment_url
WHERE t1.comment_id < t2.comment_id
This will produce a listing of DELETE commands, looking like this:
DELETE FROM mt_comment WHERE comment_id = '352'; 
DELETE FROM mt_comment WHERE comment_id = '431'; 
DELETE FROM mt_comment WHERE comment_id = '432'; 
DELETE FROM mt_comment WHERE comment_id = '509'; 
Take the listing (for example by copying it in the browser and pasting it into a SQL query field in phpMyAdmin) and execute it.

Rebuild Your Blog

After you have deleted the duplicate database records, don’t forget to rebuild your blog to reflect the changes.