Understanding the GROUP BY is one of the most important aspects of SQL! It's a powerful tool when you want to group data and then do some calculations like counting, averaging, or summing within those groups.
Imagine you have a table instagram_posts that records user posts on Instagram. The table looks something like this:
Sample instagram_posts table:
| post_id | user_id | post_date | likes |
|---|---|---|---|
| 1 | 100 | 2021-01-05 | 150 |
| 2 | 101 | 2021-02-10 | 200 |
| 3 | 102 | 2021-01-15 | 250 |
| 4 | 100 | 2022-01-20 | 300 |
| 5 | 103 | 2022-02-25 | 350 |
This table includes post IDs, user IDs (the one that made the post), the date each post was made, and the number of likes each post received.
You want to know how many posts were made each year. This is where GROUP BY comes into play.
SELECT
DATE_TRUNC('year', post_date) AS post_year,
COUNT(*) AS total_posts
FROM instagram_posts
GROUP BY post_year;
What This Does:
DATE_TRUNC('year', post_date) AS post_year: This truncates (i.e cuts) the year from the post_date to the beginning of the year and labels it as post_year. If you get a date equal to February 20, 2021, this will output Jan 1st, 2021. For any value, it will get the first date of the year.GROUP BY post_year: This groups all the posts by the year they were made.COUNT(*) AS total_posts: This counts the number of posts in each group (year).| post_year | total_posts |
|---|---|
| 2021-01-01 | 3 |
| 2022-01-01 | 2 |
By the way, if you use EXTRACT(YEAR FROM post_date) AS post_year, you will only get the year value. And your output will become:
| post_year | total_posts |
|---|---|
| 2021 | 3 |
| 2022 | 2 |
It's super simple to remember it as well. You are basically extracting the year from the post date. And the SQL is exactly like the English meaning! Feel free to use any of those 2!
If you want to get even more detailed and count the posts by both year and month:
SELECT
EXTRACT(YEAR FROM post_date) AS post_year,
EXTRACT(MONTH FROM post_date) AS post_month,
COUNT(*) AS total_posts
FROM instagram_posts
GROUP BY post_year, post_month;
What This Does:
If you are lazy like me, you probably don't need to call the whole column name in your "group by". It's too long to write: GROUP BY post_year, post_month
You can simply mention its position in the query instead. post_year is your first column and post_month is your 2nd column, so you can just replace it by 1 and 2. Just like that:
SELECT
EXTRACT(YEAR FROM post_date) AS post_year,
EXTRACT(MONTH FROM post_date) AS post_month,
COUNT(*) AS total_posts
FROM instagram_posts
GROUP BY 1,2
Tada!!! 🎉