SQL | Correlated Subqueries

A correlated subquery is a type of subquery that uses values from the outer query. For every row processed by the outer query, the subquery is executed once, often leading to more resource-intensive operations compared to a non-correlated subquery. The correlated subquery is dependent on the outer query, hence the term "correlated."

Basic Example:

To illustrate, consider two tables: employees and orders. Let's say you want to find all employees who have at least one order:

SELECT e.employee_name
FROM employees e
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.employee_id = e.employee_id
);

In this example, for each employee in the employees table, the subquery checks the orders table to see if an order exists for that employee. The subquery is correlated because it references the e.employee_id from the outer query.

Characteristics of Correlated Subqueries:

Execution Order: The outer query is executed first, but only up to the point where the correlated subquery needs to be executed. Then, for each row in the outer query, the correlated subquery is executed.
Performance: Since the correlated subquery might be executed multiple times (once for each row selected by the outer query), it can be slower than a non-correlated subquery. Proper indexing and optimization are crucial.
Reference to Outer Query: A defining characteristic of correlated subqueries is that they reference columns from the outer query. This is what differentiates them from standard subqueries.

More Examples:

Find employees whose salary is above the average salary of their department:

SELECT e.employee_name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

Find the highest order amount for each employee:

SELECT e.employee_name, o.order_amount
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
WHERE o.order_amount = (
    SELECT MAX(o2.order_amount)
    FROM orders o2
    WHERE o2.employee_id = e.employee_id
);

Points to Consider:

Optimization: Correlated subqueries can often be rewritten as joins, which can be more performant in some cases. Analyze the query execution plan to determine if the correlated subquery is the most efficient approach.
Avoid Overuse: Because correlated subqueries can be resource-intensive, avoid using them excessively or without justification. Always look for alternative approaches and test for performance.
RDBMS Differences: The support and optimization of correlated subqueries can vary among RDBMSs. Always refer to the documentation of the specific system you're using and be aware of its behavior and limitations regarding correlated subqueries.

How to Use Correlated Subqueries: Correlated subqueries refer to an outer query from within the inner subquery.

SELECT column1, 
       (SELECT MAX(column2) FROM your_table t2 WHERE t1.column1 = t2.column1) AS max_value
FROM your_table t1;

SQL Correlated Subquery vs. Regular Subquery:

Regular Subquery:

SELECT column1, 
       (SELECT MAX(column2) FROM your_table) AS max_value
FROM your_table;

Correlated Subquery:

SELECT column1, 
       (SELECT MAX(column2) FROM your_table t2 WHERE t1.column1 = t2.column1) AS max_value
FROM your_table t1;

Correlated Subqueries in WHERE Clause:

SELECT column1
FROM your_table t1
WHERE column2 > (SELECT AVG(column2) FROM your_table t2 WHERE t1.column1 = t2.column1);

Examples of Correlated Subqueries in SELECT Statement:

SELECT column1, 
       (SELECT COUNT(*) FROM your_table t2 WHERE t1.column1 = t2.column1) AS count_per_group
FROM your_table t1;

SQL Correlated Subquery in FROM Clause:

SELECT t1.column1, t2.avg_value
FROM your_table t1
JOIN (SELECT column1, AVG(column2) AS avg_value FROM your_table GROUP BY column1) t2
ON t1.column1 = t2.column1;

Correlated Subqueries with EXISTS Keyword:

SELECT column1
FROM your_table t1
WHERE EXISTS (SELECT 1 FROM another_table t2 WHERE t1.column1 = t2.column1);

Nested Correlated Subqueries in SQL:

SELECT column1, 
       (SELECT MAX(column2) FROM (SELECT * FROM your_table) t2 WHERE t1.column1 = t2.column1) AS max_value
FROM your_table t1;

Handling NULL Values in Correlated Subqueries:

Use COALESCE or IS NULL to handle NULL values in correlated subqueries.

SELECT column1, 
       (SELECT COALESCE(MAX(column2), 0) FROM your_table t2 WHERE t1.column1 = t2.column1) AS max_value
FROM your_table t1;

Correlated Subqueries with Aggregates (SUM, AVG, COUNT):

SELECT column1,
       (SELECT AVG(column2) FROM your_table t2 WHERE t1.column1 = t2.column1) AS avg_per_group
FROM your_table t1;