Grouping, GROUP BY operator

Let's run a query:

SELECT id, home_type, has_tv, price FROM Rooms;
1Private room1149
2Entire home/apt0225
3Private room1150
4Entire home/apt189
5Entire home/apt080
6Entire home/apt0200
7Private room060
8Private room179
9Private room179
10Entire home/apt1150
11Entire home/apt1135
12Private room085
13Private room089
14Private room085
15Entire home/apt1120
40Shared room140

This gives us information about each rented room. But what if we want to get information not about each record separately, but about the groups they form?

For example, such groups can be records divided by the type of housing:

  • Shared room (renting a room for several people)
  • Private room (renting a whole room)
  • Entire home/apt (renting a whole apartment)

These groups include different records in the table and, accordingly, have different characteristics that can be very useful to us.

Useful information about groups can include:

  • the average rental cost of a room or entire living space
  • the number of rented living spaces of each type

To answer all these and many other questions, there is the GROUP BY operator.

The general structure of a query with GROUP BY

SELECT [literals, aggregate_functions, grouping_fields]
FROM table_name
GROUP BY grouping_fields;

To group records by the type of housing, we need to specify home_type after GROUP BY, i.e., the field by which grouping will occur.

SELECT home_type FROM Rooms
GROUP BY home_type
Private room
Entire home/apt
Shared room

It should be noted that for GROUP BY, all NULL values are treated as equal, i.e., when grouping by a field that contains NULL values, all such rows will be included in one group.

When using the GROUP BY operator, we have moved from working with individual records to working with formed groups. Because of this, we cannot simply output any field from a record (such as has_tv or price), as we could before. This is because there may be several records in each group, and each of them may have a different value in this field.

When using GROUP BY, we can only output:

  • literals, i.e., values that are explicitly fixed.

    We can output them because they are fixed values that do not depend on anything. For example,

    SELECT home_type, "literal" FROM Rooms
    GROUP BY home_type
    Private roomliteral
    Entire home/aptliteral
    Shared roomliteral
  • aggregate function results, i.e. computed values based on a set of values.

    We will cover more detailed information about aggregate functions in the next lesson. But for example, let's consider the aggregate function AVG. The AVG function takes as an argument the name of the field we want to calculate the average value for each group based on.

    SELECT home_type, AVG(price) as avg_price FROM Rooms
    GROUP BY home_type
    Private room89.4286
    Entire home/apt148.6667
    Shared room40

    This query first divides all records from the Rooms table into 3 groups based on the home_type field. Then, for each group, it adds up all the values taken from the price field of each record included in the current group, and then divides the resulting sum by the number of records in that group.

  • grouping fields.

    We can output them because within one group, the fields on which grouping was performed are the same.

Grouping by 2 or more fields

We have already looked at how records in a table are grouped by one field. For additional illustration, it looks something like this when the grouping field is home_type:

Grouping by 1 field

When grouping by 2 or more fields, the principle remains the same, only the resulting groups are additionally divided into smaller groups depending on the second grouping field.

Example of grouping by home_type and has_tv:

Grouping by 2 field

Let's test ourselves? When using the GROUP BY operator, what can display in SELECT statement?