Data grouping and aggregate functions
Operator GROUP BY
Sometimes you need to find out information not about the objects themselves, but about the specific groups that they form. To do this, use the GROUP BY operator and aggregate functions.
The General structure of the query with GROUP BY
SELECT [constants, aggregate_function, grouping_fields] FROM table_name GROUP BY grouping_fields;
GROUP BY example
To demonstrate the work of the GROUP BY operator, we derive the total amount of money spent, grouped by family members.
SELECT family_member, SUM(unit_price * amount) FROM Payments GROUP BY family_member;
Below is the data set in the Payments table:
When the request is executed, the grouping is done by the family_member field and the total amount spent on purchases by each family member is added up.
In order to better understand how grouping by field occurs, we transform the source data into a new, grouped view:
As you can see, groups of records have been formed, united by onefamily_member. After that, we can apply the sum formula inside each of these groups, which will multiply the quantity of the product by its value, and then add up all the resulting values:
It should be borne in mind that for GROUP BY all NULL values are treated as equal, i.e. when grouped by a field containing NULL values, all such rows will fall into one group.
An aggregate function performs a calculation on a set of values and returns a single value.
Aggregate functions are used for non-null values. The exception is the COUNT() function.
|SUM(table_field)||Returns the sum of the values|
|AVG(table_field)||Returns the average value|
|COUNT(table_field)||Returns the number of records|
|MIN(table_field)||Returns the minimum value|
|MAX(table_field)||Returns the maximum value|
To filter rows by values of aggregate functions, the HAVING operator is used.
General query structure with the HAVING operator
SELECT [constants, aggregate_function, grouping_fields] FROM table_name GROUP BY grouping_fields HAVING conditions_for_selection_restrictions_after_grouping
Difference between HAVING and WHERE
- WHERE — records are selected by condition first, and then can be grouped, sorted, etc.
- HAVING — records are grouped first, and then selected by condition, while, unlike WHERE, you can use the values of aggregate functions in it
Display the total amount spent on purchases for each family member, where the total purchase amount is less than 5000 rubles:
SELECT family_member, SUM(unit_price * amount) AS sum FROM Payments GROUP BY family_member HAVING sum < 5000;