SQL Tutorial

SQL Clauses / Operators

SQL-Injection

SQL Functions

SQL Queries

PL/SQL

MySQL

SQL Server

Misc

SQL | GROUP BY

The GROUP BY clause in SQL is used to group rows based on common values in specified columns into summary rows, much like the "group by" feature in spreadsheet applications. It's often used with aggregate functions like COUNT(), SUM(), AVG(), MAX(), or MIN() to perform an operation on each group of rows.

Syntax:

SELECT column1, column2, aggregate_function(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2;

Examples:

  • Grouping by a Single Column: Suppose you have a sales table, and you want to know the total amount of sales for each product:
SELECT product_id, SUM(amount)
FROM sales
GROUP BY product_id;
  • Grouping by Multiple Columns: If you want to know the total amount of sales for each product in each store:
SELECT store_id, product_id, SUM(amount)
FROM sales
GROUP BY store_id, product_id;
  • Using GROUP BY with WHERE: To know the total sales of a particular product in each store:
SELECT store_id, SUM(amount)
FROM sales
WHERE product_id = 'A123'
GROUP BY store_id;
  • Using GROUP BY with HAVING: The HAVING clause is like WHERE but operates on grouped data produced by a GROUP BY. If you want to filter groups based on aggregate results, you use HAVING. For example, to find products that have been sold more than 100 times:
SELECT product_id, COUNT(*)
FROM sales
GROUP BY product_id
HAVING COUNT(*) > 100;

Key Points:

  • Order Matters: The order of columns in the GROUP BY clause determines the hierarchy of grouping. In the second example, rows are first grouped by store_id and then by product_id within each store group.

  • Select Clause Consistency: All columns in the SELECT statement that aren't used in aggregate functions should be included in the GROUP BY clause. For instance, if you select store_id and product_id without using them inside an aggregate function, they both should appear in the GROUP BY clause.

  • HAVING vs WHERE: WHERE filters rows before they are grouped, while HAVING filters after rows are grouped. This distinction is crucial when you want to filter based on an aggregate measure.

Remember, GROUP BY is a powerful tool in SQL for summarizing and aggregating data, but it requires a clear understanding of the data and desired results to use effectively.

  1. GROUP BY in SQL:
    • Description: GROUP BY is used to group rows based on one or more columns, allowing for aggregation on each group.
    • Example:
      SELECT department_id, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY department_id;
      

Aggregating Data with GROUP BY:

  1. Aggregating Data with GROUP BY:
    • Description: GROUP BY is commonly used with aggregate functions like SUM, AVG, COUNT, MIN, and MAX.
    • Example:
      SELECT department_id, COUNT(employee_id) AS num_employees
      FROM employees
      GROUP BY department_id;
      

GROUP BY with Aggregate Functions in SQL:

  1. GROUP BY with Aggregate Functions in SQL:
    • Description: Aggregate functions perform calculations on groups defined by the GROUP BY clause.
    • Example:
      SELECT department_id, SUM(salary) AS total_salary
      FROM employees
      GROUP BY department_id;
      

GROUP BY and HAVING Clause in SQL:

  1. GROUP BY and HAVING Clause in SQL:
    • Description: HAVING is used with GROUP BY to filter groups based on aggregate conditions.
    • Example:
      SELECT department_id, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY department_id
      HAVING AVG(salary) > 50000;
      

Multi-column GROUP BY in SQL:

  1. Multi-column GROUP BY in SQL:
    • Description: GROUP BY can include multiple columns to create more granular groups.
    • Example:
      SELECT department_id, job_id, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY department_id, job_id;
      

GROUP BY and ORDER BY in SQL:

  1. GROUP BY and ORDER BY in SQL:
    • Description: ORDER BY can be used with GROUP BY to sort the result set.
    • Example:
      SELECT department_id, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY department_id
      ORDER BY department_id;
      

GROUP BY with COUNT, SUM, AVG, MIN, MAX:

  1. GROUP BY with COUNT, SUM, AVG, MIN, MAX:
    • Example:
      SELECT department_id,
             COUNT(employee_id) AS num_employees,
             SUM(salary) AS total_salary,
             AVG(salary) AS avg_salary,
             MIN(salary) AS min_salary,
             MAX(salary) AS max_salary
      FROM employees
      GROUP BY department_id;
      

GROUP BY with JOIN Operations in SQL:

  1. GROUP BY with JOIN Operations in SQL:
    • Description: GROUP BY can be used with JOIN to aggregate data from multiple tables.
    • Example:
      SELECT d.department_name, AVG(e.salary) AS avg_salary
      FROM departments d
      JOIN employees e ON d.department_id = e.department_id
      GROUP BY d.department_name;
      

Handling NULL Values with GROUP BY:

  1. Handling NULL Values with GROUP BY:
    • Description: NULL values can be included in groups when using GROUP BY.
    • Example:
      SELECT department_id, COUNT(*) AS num_employees
      FROM employees
      GROUP BY department_id;
      

GROUP BY and Subqueries in SQL:

  1. GROUP BY and Subqueries in SQL:
    • Description: Subqueries can be used within GROUP BY for more complex aggregations.
    • Example:
      SELECT department_id, AVG(salary) AS avg_salary
      FROM (
        SELECT department_id, salary
        FROM employees
        WHERE hire_date > '2022-01-01'
      ) AS filtered_employees
      GROUP BY department_id;
      

GROUP BY vs. DISTINCT in SQL:

  1. GROUP BY vs. DISTINCT in SQL:
    • Description: GROUP BY and DISTINCT both remove duplicate rows but have different use cases.
    • Example (GROUP BY):
      SELECT department_id
      FROM employees
      GROUP BY department_id;
      
    • Example (DISTINCT):
      SELECT DISTINCT department_id
      FROM employees;
      

GROUP BY and Window Functions in SQL:

  1. GROUP BY and Window Functions in SQL:
    • Description: Window functions can be used with GROUP BY to perform calculations on a subset of rows within each group.
    • Example:
      SELECT department_id, employee_id, salary,
             AVG(salary) OVER (PARTITION BY department_id) AS avg_salary_per_dept
      FROM employees;
      

GROUP BY and Calculated Columns:

  1. GROUP BY and Calculated Columns:
    • Description: Calculated columns can be used in GROUP BY for more complex groupings.
    • Example:
      SELECT SUBSTRING(employee_name, 1, 1) AS initial_letter,
             AVG(salary) AS avg_salary
      FROM employees
      GROUP BY SUBSTRING(employee_name, 1, 1);