You have a table with duplicate rows – somehow a unique index didn’t get created and a bug has added duplicate records to your table. A pox upon that bug!
Here’s two easy ways to clean out that table quickly.
1) Use ALTER IGNORE on MySQL 5.1+
MySQL will allow you to create a unique index on a table with duplicate records with its IGNORE SQL extension:
ALTER IGNORE TABLE 'SHIPMENTS' ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)
Duplicates will be deleted. ALTER IGNORE does not work in Percona because of their InnoDB fast index creation feature.
2) Recreate the table with GROUP BY
execute 'CREATE TABLE shipments_deduped like shipments;' execute 'INSERT shipments_deduped SELECT * FROM shipments GROUP BY cart_id, tracking_number;' execute 'RENAME TABLE shipments TO shipments_with_dupes;' execute 'RENAME TABLE shipments_deduped TO shipments;' add_index :shipments, [:cart_id, :tracking_number], :unique => true execute 'DROP TABLE shipments_with_dupes;'
Recreating the table is much, much faster than trying to delete the records in the existing table and doesn’t lock the existing table, making your application downtime minimal. This method will not work if MySQL’s sql_mode includes ONLY_FULL_GROUP_BY.
Your second option could potentially loose data if something inserts for example between the copy and rename.
That’s why you have application downtime.
Really helpful, keep it up.
Alot of thank from my crew too.
Thanks a lot. Works like charm!
You could potentially lose records of value using ‘GROUP BY’ only, an example would be a company sales ledger – on the ledger you might have supplier, account number, and balance. 2 records in the table have records for the same supplier and account but the amounts are different because they are 2 seperate orders. If you were to group by account number and supplier, you would lose one of those records. Sometimes replacing GROUP BY with DISTINCT will be the better option.
How about a delete:
delete from shipments
USING shipments, shipments as vtable
WHERE (NOT shipments.ID=vtable.ID)
AND (shipments.tracking_number=vtable.tracking_number)
original from: http://www.cyberciti.biz/faq/howto-removing-eliminating-duplicates-from-a-mysql-table/#comment-37342
Lifesaver! A script on our site created duplicates with new PK ID’s. I just wanted the dupes out based on the other fields. Worked like a charm!
Radu: I beleive that method would delete both records, and not leave one behind. That method also uses ‘negative’ logic, which I beleive should always be avoided where possible. An alternative would be to replace the ‘not equal’ to ‘greater than’.
delete from table1
USING table1, table1 as vtable
WHERE (table1.ID > vtable.ID)
AND (table1.field_name=vtable.field_name)
That would work.
Thanks. The first solution was exactly what I needed, because my plan was to add a unique index anyway. It took about 800ms for a table with about 35,000 records with about 1,000 duplicates. I don’t think users would have noticed the lock up time.
ALTER IGNORE TABLE ‘SHIPMENTS’ ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)
This does not work on MySQL 5.5.24. It throws a duplicate entry error.
The first solution works with MyISAM engine only?
Simply brilliant! The best of any solution that I found and I search for hours, kudos!!!
That is excellent and has just saved me so much time and made an application much easier to set up from an uploaded spreadsheet.
Thanks