PostgreSQL Tutorial

Data Types

Querying & Filtering Data

Managing Tables

Modifying Data

Conditionals

Control Flow

Transactions & Constraints

Working with JOINS & Schemas

Roles & Permissions

Working with Sets

Subquery & CTEs

User-defined Functions

Important In-Built Functions

PostgreSQL PL/pgSQL

Variables & Constants

Stored Procedures

Working with Triggers

Working with Views & Indexes

Errors & Exception Handling

PostgreSQL - Deleting Duplicate Rows using Subquery

Deleting duplicate rows in PostgreSQL using a subquery involves identifying the duplicates based on certain criteria and then eliminating the unwanted rows. Let's consider a simple example to demonstrate the process.

Suppose we have a table named students with columns id, name, and age, and there might be multiple rows with the same name and age, but different id values. We want to keep one unique row for each combination of name and age and delete the duplicates.

Here's how you can achieve this:

  1. Using a Common Table Expression (WITH clause):

    WITH duplicates AS (
        SELECT name, age, 
               ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY id) AS row_num
        FROM students
    )
    DELETE FROM students 
    WHERE id IN (SELECT id FROM duplicates WHERE row_num > 1);
    

    In this method, we use the ROW_NUMBER() window function to assign a unique number to each row within the partitions of name and age. Any row number greater than 1 indicates a duplicate.

  2. Using a Subquery without a CTE:

    DELETE FROM students
    WHERE id NOT IN (
        SELECT DISTINCT ON (name, age) id
        FROM students
        ORDER BY name, age, id
    );
    

    Here, DISTINCT ON selects one id for each combination of name and age, prioritizing the lowest id value because of the ORDER BY clause. The outer query then deletes all rows that don't have those selected id values, thus removing duplicates.

  3. Using JOIN with a Subquery:

    This method involves joining the table with a subquery that identifies the rows to be retained:

    DELETE FROM students s1
    USING (
        SELECT name, age, MAX(id) as max_id 
        FROM students 
        GROUP BY name, age 
        HAVING COUNT(*) > 1
    ) s2
    WHERE s1.name = s2.name AND s1.age = s2.age AND s1.id <> s2.max_id;
    

    In the subquery, we identify each combination of name and age that occurs more than once and select the maximum id for each group. The outer DELETE query then deletes all rows for these groups except the ones with the maximum id.

Note:

Before running the DELETE statement, it's always a good idea to run a SELECT statement using the same criteria to verify that you're targeting the correct rows. Also, make sure you have backups or use transactions to safeguard against unintended data deletions.

  1. Deleting duplicate rows with a subquery in PostgreSQL:

    • Description: Uses a subquery to identify and delete duplicate records.
    • Code:
      DELETE FROM your_table
      WHERE id NOT IN (
          SELECT MIN(id)
          FROM your_table
          GROUP BY duplicate_column
      );
      
  2. Using subqueries to identify and delete duplicate records in PostgreSQL:

    • Description: Utilizes subqueries to first identify and then delete duplicates.
    • Code:
      DELETE FROM your_table
      WHERE id IN (
          SELECT id
          FROM your_table
          GROUP BY duplicate_column
          HAVING COUNT(*) > 1
      );
      
  3. Finding and removing duplicates with correlated subqueries in PostgreSQL:

    • Description: Employs correlated subqueries to identify and remove duplicates.
    • Code:
      DELETE FROM your_table t1
      WHERE EXISTS (
          SELECT 1
          FROM your_table t2
          WHERE t1.duplicate_column = t2.duplicate_column
          AND t1.id < t2.id
      );
      
  4. Deleting duplicates based on specific columns in PostgreSQL:

    • Description: Targets specific columns for identifying and deleting duplicates.
    • Code:
      DELETE FROM your_table
      WHERE (column1, column2) IN (
          SELECT column1, column2
          FROM your_table
          GROUP BY column1, column2
          HAVING COUNT(*) > 1
      );
      
  5. Handling NULL values when deleting duplicates in PostgreSQL:

    • Description: Considers NULL values when dealing with duplicate elimination.
    • Code:
      DELETE FROM your_table
      WHERE (column1, column2) IN (
          SELECT column1, column2
          FROM your_table
          WHERE column1 IS NOT NULL AND column2 IS NOT NULL
          GROUP BY column1, column2
          HAVING COUNT(*) > 1
      );
      
  6. Deleting duplicates from multiple tables with a subquery in PostgreSQL:

    • Description: Addresses duplicate removal across multiple related tables.
    • Code:
      DELETE FROM table1
      WHERE (column1, column2) IN (
          SELECT column1, column2
          FROM table1
          JOIN table2 ON table1.id = table2.id
          GROUP BY column1, column2
          HAVING COUNT(*) > 1
      );
      
  7. Using EXISTS in subqueries for deleting duplicates in PostgreSQL:

    • Description: Utilizes EXISTS in subqueries to identify and delete duplicates.
    • Code:
      DELETE FROM your_table t1
      WHERE EXISTS (
          SELECT 1
          FROM your_table t2
          WHERE t1.duplicate_column = t2.duplicate_column
          AND t1.id < t2.id
      );
      
  8. Avoiding common pitfalls when deleting duplicates with a subquery in PostgreSQL:

    • Description: Highlights potential challenges and offers solutions.
    • Code:
      -- Check for NULL values and use appropriate conditions to avoid unintended removal