Deleting Duplicate Rows in MySQL

You have a table with duplicate rows – somehow a unique index didn’t get created and a bug has added duplicate records to your table. A pox upon that bug!

Here’s two easy ways to clean out that table quickly.

1) Use ALTER IGNORE on MySQL 5.1+

MySQL will allow you to create a unique index on a table with duplicate records with its IGNORE SQL extension:

ALTER IGNORE TABLE 'SHIPMENTS' ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)

Duplicates will be deleted. ALTER IGNORE does not work in Percona because of their InnoDB fast index creation feature.

2) Recreate the table with GROUP BY

    execute 'CREATE TABLE shipments_deduped like shipments;'
    execute 'INSERT shipments_deduped SELECT * FROM shipments GROUP BY cart_id, tracking_number;'
    execute 'RENAME TABLE shipments TO shipments_with_dupes;'
    execute 'RENAME TABLE shipments_deduped TO shipments;'
    add_index :shipments, [:cart_id, :tracking_number], :unique => true
    execute 'DROP TABLE shipments_with_dupes;'

Recreating the table is much, much faster than trying to delete the records in the existing table and doesn’t lock the existing table, making your application downtime minimal. This method will not work if MySQL’s sql_mode includes ONLY_FULL_GROUP_BY.

13 thoughts on “Deleting Duplicate Rows in MySQL

  1. Betelgeuse

    Your second option could potentially loose data if something inserts for example between the copy and rename.

  2. Gareth

    You could potentially lose records of value using ‘GROUP BY’ only, an example would be a company sales ledger – on the ledger you might have supplier, account number, and balance. 2 records in the table have records for the same supplier and account but the amounts are different because they are 2 seperate orders. If you were to group by account number and supplier, you would lose one of those records. Sometimes replacing GROUP BY with DISTINCT will be the better option.

  3. Renier

    Lifesaver! A script on our site created duplicates with new PK ID’s. I just wanted the dupes out based on the other fields. Worked like a charm!

  4. Gareth

    Radu: I beleive that method would delete both records, and not leave one behind. That method also uses ‘negative’ logic, which I beleive should always be avoided where possible. An alternative would be to replace the ‘not equal’ to ‘greater than’.

    delete from table1
    USING table1, table1 as vtable
    WHERE (table1.ID > vtable.ID)
    AND (table1.field_name=vtable.field_name)

    That would work.

  5. Banks

    Thanks. The first solution was exactly what I needed, because my plan was to add a unique index anyway. It took about 800ms for a table with about 35,000 records with about 1,000 duplicates. I don’t think users would have noticed the lock up time.

  6. Backslider

    ALTER IGNORE TABLE ‘SHIPMENTS’ ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)

    This does not work on MySQL 5.5.24. It throws a duplicate entry error.

  7. Eric

    Simply brilliant! The best of any solution that I found and I search for hours, kudos!!!

  8. iprogress

    That is excellent and has just saved me so much time and made an application much easier to set up from an uploaded spreadsheet.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">