SQL Tutorial
SQL Clauses / Operators
SQL-Injection
SQL Functions
SQL Queries
PL/SQL
MySQL
SQL Server
Misc
The GROUP BY
clause in SQL is used to group rows based on common values in specified columns into summary rows, much like the "group by" feature in spreadsheet applications. It's often used with aggregate functions like COUNT()
, SUM()
, AVG()
, MAX()
, or MIN()
to perform an operation on each group of rows.
SELECT column1, column2, aggregate_function(column3) FROM table_name WHERE condition GROUP BY column1, column2;
sales
table, and you want to know the total amount of sales for each product:SELECT product_id, SUM(amount) FROM sales GROUP BY product_id;
SELECT store_id, product_id, SUM(amount) FROM sales GROUP BY store_id, product_id;
SELECT store_id, SUM(amount) FROM sales WHERE product_id = 'A123' GROUP BY store_id;
HAVING
clause is like WHERE but operates on grouped data produced by a GROUP BY. If you want to filter groups based on aggregate results, you use HAVING. For example, to find products that have been sold more than 100 times:SELECT product_id, COUNT(*) FROM sales GROUP BY product_id HAVING COUNT(*) > 100;
Order Matters: The order of columns in the GROUP BY
clause determines the hierarchy of grouping. In the second example, rows are first grouped by store_id
and then by product_id
within each store group.
Select Clause Consistency: All columns in the SELECT
statement that aren't used in aggregate functions should be included in the GROUP BY
clause. For instance, if you select store_id
and product_id
without using them inside an aggregate function, they both should appear in the GROUP BY
clause.
HAVING vs WHERE: WHERE
filters rows before they are grouped, while HAVING
filters after rows are grouped. This distinction is crucial when you want to filter based on an aggregate measure.
Remember, GROUP BY
is a powerful tool in SQL for summarizing and aggregating data, but it requires a clear understanding of the data and desired results to use effectively.
SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id;
SELECT department_id, COUNT(employee_id) AS num_employees FROM employees GROUP BY department_id;
SELECT department_id, SUM(salary) AS total_salary FROM employees GROUP BY department_id;
SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id HAVING AVG(salary) > 50000;
SELECT department_id, job_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id, job_id;
SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id ORDER BY department_id;
SELECT department_id, COUNT(employee_id) AS num_employees, SUM(salary) AS total_salary, AVG(salary) AS avg_salary, MIN(salary) AS min_salary, MAX(salary) AS max_salary FROM employees GROUP BY department_id;
SELECT d.department_name, AVG(e.salary) AS avg_salary FROM departments d JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name;
SELECT department_id, COUNT(*) AS num_employees FROM employees GROUP BY department_id;
SELECT department_id, AVG(salary) AS avg_salary FROM ( SELECT department_id, salary FROM employees WHERE hire_date > '2022-01-01' ) AS filtered_employees GROUP BY department_id;
SELECT department_id FROM employees GROUP BY department_id;
SELECT DISTINCT department_id FROM employees;
SELECT department_id, employee_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg_salary_per_dept FROM employees;
SELECT SUBSTRING(employee_name, 1, 1) AS initial_letter, AVG(salary) AS avg_salary FROM employees GROUP BY SUBSTRING(employee_name, 1, 1);