PostgreSQL - Deleting Duplicate Rows using Subquery

Deleting duplicate rows in PostgreSQL using a subquery involves identifying the duplicates based on certain criteria and then eliminating the unwanted rows. Let's consider a simple example to demonstrate the process.

Suppose we have a table named students with columns id, name, and age, and there might be multiple rows with the same name and age, but different id values. We want to keep one unique row for each combination of name and age and delete the duplicates.

Here's how you can achieve this:

Using a Common Table Expression (WITH clause):
```
WITH duplicates AS (
    SELECT name, age, 
           ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY id) AS row_num
    FROM students
)
DELETE FROM students 
WHERE id IN (SELECT id FROM duplicates WHERE row_num > 1);
```
In this method, we use the ROW_NUMBER() window function to assign a unique number to each row within the partitions of name and age. Any row number greater than 1 indicates a duplicate.
Using a Subquery without a CTE:
```
DELETE FROM students
WHERE id NOT IN (
    SELECT DISTINCT ON (name, age) id
    FROM students
    ORDER BY name, age, id
);
```
Here, DISTINCT ON selects one id for each combination of name and age, prioritizing the lowest id value because of the ORDER BY clause. The outer query then deletes all rows that don't have those selected id values, thus removing duplicates.
Using JOIN with a Subquery:
This method involves joining the table with a subquery that identifies the rows to be retained:
```
DELETE FROM students s1
USING (
    SELECT name, age, MAX(id) as max_id 
    FROM students 
    GROUP BY name, age 
    HAVING COUNT(*) > 1
) s2
WHERE s1.name = s2.name AND s1.age = s2.age AND s1.id <> s2.max_id;
```
In the subquery, we identify each combination of name and age that occurs more than once and select the maximum id for each group. The outer DELETE query then deletes all rows for these groups except the ones with the maximum id.

Note:

Before running the DELETE statement, it's always a good idea to run a SELECT statement using the same criteria to verify that you're targeting the correct rows. Also, make sure you have backups or use transactions to safeguard against unintended data deletions.

Deleting duplicate rows with a subquery in PostgreSQL:
- Description: Uses a subquery to identify and delete duplicate records.
- Code:
```
DELETE FROM your_table
WHERE id NOT IN (
    SELECT MIN(id)
    FROM your_table
    GROUP BY duplicate_column
);
```
Using subqueries to identify and delete duplicate records in PostgreSQL:
- Description: Utilizes subqueries to first identify and then delete duplicates.
- Code:
```
DELETE FROM your_table
WHERE id IN (
    SELECT id
    FROM your_table
    GROUP BY duplicate_column
    HAVING COUNT(*) > 1
);
```

Finding and removing duplicates with correlated subqueries in PostgreSQL:

Description: Employs correlated subqueries to identify and remove duplicates.

Code:

DELETE FROM your_table t1
WHERE EXISTS (
    SELECT 1
    FROM your_table t2
    WHERE t1.duplicate_column = t2.duplicate_column
    AND t1.id < t2.id
);

Deleting duplicates based on specific columns in PostgreSQL:

Description: Targets specific columns for identifying and deleting duplicates.

Code:

DELETE FROM your_table
WHERE (column1, column2) IN (
    SELECT column1, column2
    FROM your_table
    GROUP BY column1, column2
    HAVING COUNT(*) > 1
);

Handling NULL values when deleting duplicates in PostgreSQL:

Description: Considers NULL values when dealing with duplicate elimination.

Code:

DELETE FROM your_table
WHERE (column1, column2) IN (
    SELECT column1, column2
    FROM your_table
    WHERE column1 IS NOT NULL AND column2 IS NOT NULL
    GROUP BY column1, column2
    HAVING COUNT(*) > 1
);

Deleting duplicates from multiple tables with a subquery in PostgreSQL:

Description: Addresses duplicate removal across multiple related tables.

Code:

DELETE FROM table1
WHERE (column1, column2) IN (
    SELECT column1, column2
    FROM table1
    JOIN table2 ON table1.id = table2.id
    GROUP BY column1, column2
    HAVING COUNT(*) > 1
);

Using EXISTS in subqueries for deleting duplicates in PostgreSQL:

Description: Utilizes EXISTS in subqueries to identify and delete duplicates.

Code:

DELETE FROM your_table t1
WHERE EXISTS (
    SELECT 1
    FROM your_table t2
    WHERE t1.duplicate_column = t2.duplicate_column
    AND t1.id < t2.id
);

Avoiding common pitfalls when deleting duplicates with a subquery in PostgreSQL:
- Description: Highlights potential challenges and offers solutions.
- Code:
```
-- Check for NULL values and use appropriate conditions to avoid unintended removal
```