Sunday, November 5, 2023

Select and Delete Duplicates from Sqlite Table

Select duplicates from table PonsApimetaDE with columns DE, meta

SELECT DE, meta, COUNT(*)
FROM PonsApimetaDE
GROUP BY DE, meta
HAVING COUNT(*) > 1

Delete duplicates from table PonsApimetaDE with columns DE, meta

DELETE FROM PonsApimetaDE
WHERE rowid NOT IN (
  SELECT MIN(rowid)
  FROM PonsApimetaDE
  GROUP BY DE, meta
)

Delete duplicates from table

CREATE TABLE temp_table as SELECT DISTINCT * FROM source_table;
DELETE FROM source_table;
INSERT INTO source_table SELECT * FROM temp_table
DROP TABLE temp_table

Create index for table PonsApimetaDE with columns DE, meta

CREATE UNIQUE INDEX "idxuniquemetaDE" ON "PonsApimetaDE" (
  "DE"  ASC,
  "meta"  ASC
)

*****

SELECT * FROM table
WHERE rowid > (
  SELECT MIN(rowid) FROM table p2
  WHERE table.column1 = p2.column1
  AND table.column2 = p2.column2
);

DELETE FROM table
WHERE rowid > (
  SELECT MIN(rowid) FROM table p2
  WHERE table.column1 = p2.column1
  AND table.column2 = p2.column2
);

p2 is an alias for the table table in the subquery. This alias is used to make it easier to reference the table table in the subquery without having to type out the entire table name each time.

The purpose of this DELETE statement is to delete all rows from the table table where the column1 and column2 are the same as other rows, but only if the current row's rowid is greater than the smallest rowid with the same column1 and column2.

Here's a breakdown of the query:

The DELETE FROM table statement tells SQLite to delete all rows from the table table that meet the conditions specified in the WHERE clause.

The WHERE rowid > (subquery) clause restricts the rows to be deleted based on their rowid.

The subquery (SELECT MIN(rowid) FROM table p2 WHERE table.column1 = p2.column1 AND table.column2 = p2.column2) calculates the smallest rowid in the table table that has the same column1 and column2 as the current row.

The p2 alias in the subquery allows us to reference the table table without having to type out the entire table name each time.

In summary, this query will delete all duplicate rows from the table table, leaving only one row per unique combination of column1 and column2.