Deleting Duplicate Rows in MySQL

You have a table with duplicate rows – somehow a unique index didn’t get created and a bug has added duplicate records to your table. A pox upon that bug!

Here’s two easy ways to clean out that table quickly.

1) Use ALTER IGNORE on MySQL 5.1+

MySQL will allow you to create a unique index on a table with duplicate records with its IGNORE SQL extension:

ALTER IGNORE TABLE 'SHIPMENTS' ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)

Duplicates will be deleted. ALTER IGNORE does not normally work in Percona because of their InnoDB fast index creation feature but if you “set session old_alter_table=1″ beforehand, Percona will use the old alter table behavior.

2) Recreate the table with GROUP BY

    execute 'CREATE TABLE shipments_deduped like shipments;'
    execute 'INSERT shipments_deduped SELECT * FROM shipments GROUP BY cart_id, tracking_number;'
    execute 'RENAME TABLE shipments TO shipments_with_dupes;'
    execute 'RENAME TABLE shipments_deduped TO shipments;'
    add_index :shipments, [:cart_id, :tracking_number], :unique => true
    execute 'DROP TABLE shipments_with_dupes;'

Recreating the table is much, much faster than trying to delete the records in the existing table and doesn’t lock the existing table, making your application downtime minimal. This method will not work if MySQL’s sql_mode includes ONLY_FULL_GROUP_BY.

13 thoughts on “Deleting Duplicate Rows in MySQL”

  1. Your second option could potentially loose data if something inserts for example between the copy and rename.

  2. You could potentially lose records of value using ‘GROUP BY’ only, an example would be a company sales ledger – on the ledger you might have supplier, account number, and balance. 2 records in the table have records for the same supplier and account but the amounts are different because they are 2 seperate orders. If you were to group by account number and supplier, you would lose one of those records. Sometimes replacing GROUP BY with DISTINCT will be the better option.

  3. Lifesaver! A script on our site created duplicates with new PK ID’s. I just wanted the dupes out based on the other fields. Worked like a charm!

  4. Radu: I beleive that method would delete both records, and not leave one behind. That method also uses ‘negative’ logic, which I beleive should always be avoided where possible. An alternative would be to replace the ‘not equal’ to ‘greater than’.

    delete from table1
    USING table1, table1 as vtable
    WHERE (table1.ID > vtable.ID)
    AND (table1.field_name=vtable.field_name)

    That would work.

  5. Thanks. The first solution was exactly what I needed, because my plan was to add a unique index anyway. It took about 800ms for a table with about 35,000 records with about 1,000 duplicates. I don’t think users would have noticed the lock up time.

  6. ALTER IGNORE TABLE ‘SHIPMENTS’ ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)

    This does not work on MySQL 5.5.24. It throws a duplicate entry error.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>