PostgreSQL Tutorial
Data Types
Querying & Filtering Data
Managing Tables
Modifying Data
Conditionals
Control Flow
Transactions & Constraints
Working with JOINS & Schemas
Roles & Permissions
Working with Sets
Subquery & CTEs
User-defined Functions
Important In-Built Functions
PostgreSQL PL/pgSQL
Variables & Constants
Stored Procedures
Working with Triggers
Working with Views & Indexes
Errors & Exception Handling
The GROUP BY
clause in PostgreSQL is used to group rows that have the same values in specified columns into summary rows, much like the way the GROUP BY
clause works in other SQL databases. It's often used with aggregate functions like COUNT()
, MAX()
, MIN()
, SUM()
, AVG()
, etc., to perform an operation on each group of rows.
SELECT column1, column2, aggregate_function(column3) FROM table_name WHERE condition GROUP BY column1, column2 ORDER BY column1, column2;
Grouping by a Single Column:
Assume we have a table sales
with columns product
and amount_sold
.
To find out the total amount sold for each product:
SELECT product, SUM(amount_sold) AS total_sold FROM sales GROUP BY product;
Grouping by Multiple Columns:
If the sales
table also has a date
column, and you want to know the total amount sold for each product on each date:
SELECT date, product, SUM(amount_sold) AS total_sold FROM sales GROUP BY date, product ORDER BY date, product;
Using GROUP BY
with HAVING
:
The HAVING
clause is used to filter the results after grouping. For instance, to find products that have sold more than 100 units in total:
SELECT product, SUM(amount_sold) AS total_sold FROM sales GROUP BY product HAVING SUM(amount_sold) > 100;
Note: The key difference between WHERE
and HAVING
is that WHERE
filters rows before they are grouped, while HAVING
filters after the grouping.
Grouping by Expressions:
You can also group by the result of an expression. For instance, if you want to group sales by the year:
SELECT EXTRACT(YEAR FROM date) AS sales_year, SUM(amount_sold) AS total_sold FROM sales GROUP BY EXTRACT(YEAR FROM date);
Every column in the SELECT
clause that isn't used in an aggregate function must be listed in the GROUP BY
clause. Otherwise, PostgreSQL will raise an error.
Be cautious when using GROUP BY
on columns that can have null values. All null values are considered equal, and they will be grouped together.
Using GROUP BY
can impact performance, especially on large tables. Always ensure your columns used in the GROUP BY
clause are indexed if they are frequently used in queries, and monitor the performance of your queries.
How to use GROUP BY in PostgreSQL:
SELECT department, COUNT(*) FROM employees GROUP BY department;
PostgreSQL GROUP BY example:
SELECT country, AVG(population) FROM cities GROUP BY country;
Aggregation functions with GROUP BY in PostgreSQL:
SELECT department, SUM(salary) FROM employees GROUP BY department;
Multiple columns in GROUP BY in PostgreSQL:
SELECT department, job_title, AVG(salary) FROM employees GROUP BY department, job_title;
HAVING clause with GROUP BY in PostgreSQL:
SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;
GROUP BY vs. DISTINCT in PostgreSQL:
SELECT DISTINCT department FROM employees;
ORDER BY with GROUP BY in PostgreSQL:
SELECT department, AVG(salary) FROM employees GROUP BY department ORDER BY AVG(salary) DESC;
Nested GROUP BY in PostgreSQL:
SELECT country, city, AVG(population) FROM cities GROUP BY country, city;
GROUP BY and JOIN in PostgreSQL:
SELECT departments.name, AVG(employees.salary) FROM departments JOIN employees ON departments.id = employees.department_id GROUP BY departments.name;
Using GROUP BY for date and time aggregations in PostgreSQL:
SELECT date_trunc('month', order_date) AS month, SUM(total_amount) FROM orders GROUP BY date_trunc('month', order_date);
GROUP BY with ROLLUP and CUBE in PostgreSQL:
SELECT department, job_title, AVG(salary) FROM employees GROUP BY ROLLUP (department, job_title);
GROUP BY and window functions in PostgreSQL:
SELECT department, job_title, AVG(salary), RANK() OVER (PARTITION BY department ORDER BY AVG(salary) DESC) AS rank_within_department FROM employees GROUP BY department, job_title;
GROUP BY with CASE statement in PostgreSQL:
SELECT CASE WHEN age < 30 THEN 'Young' WHEN age BETWEEN 30 AND 50 THEN 'Middle-aged' ELSE 'Senior' END AS age_group, AVG(salary) FROM employees GROUP BY age_group;
Conditional aggregations with GROUP BY in PostgreSQL:
SELECT department, AVG(CASE WHEN job_title = 'Manager' THEN salary ELSE 0 END) AS avg_manager_salary, AVG(CASE WHEN job_title = 'Developer' THEN salary ELSE 0 END) AS avg_dev_salary FROM employees GROUP BY department;
GROUP BY and subqueries in PostgreSQL:
SELECT department, AVG(subquery.total_sales) FROM employees JOIN (SELECT employee_id, SUM(sale_amount) AS total_sales FROM sales GROUP BY employee_id) AS subquery ON employees.id = subquery.employee_id GROUP BY department;
Combining GROUP BY with other clauses in PostgreSQL:
SELECT department, AVG(salary) FROM employees WHERE hire_date > '2022-01-01' GROUP BY department;