In our previous practices, we used GROUP BY clause to group and summarize data from a dataset, like Uber trip data. Let's say, you have this table uber_trips and wanna know how many trips happen every day:
Sample of uber_trips table:
| ride_id | user_id | cost | trip_start_at |
|---|---|---|---|
| 101 | 1001 | $12.50 | 2023-01-15 8:30 AM |
| 102 | 1002 | $18.75 | 2023-01-15 9:15 AM |
| 103 | 1003 | $8.20 | 2023-01-15 10:00 AM |
| 104 | 1004 | $14.30 | 2023-01-15 10:45 AM |
| 105 | 1005 | $10.50 | 2023-01-15 11:30 AM |
To get the total number of rides for each day, you will use the trip_start_at column and convert that to a date and then count the number of trips:
SELECT
DATE(trip_start_at) AS date,
COUNT(*) AS total_trips
FROM uber_trips
GROUP BY 1
But what if you want to find specific days when something interesting happened, like days when the total number of Uber trips exceeded 1,000? This is where the HAVING clause becomes useful.
The HAVING clause is like a filter for grouped data. It allows you to filter groups based on some condition. Think of it as the "WHERE clause for aggregative values." What are aggregative values? These are values that you get when you use functions like COUNT, SUM, AVERAGE, and more to group and summarize your data.
So, in SQL, it might look like this:
SELECT
DATE(trip_start_at) AS date,
COUNT(*) AS total_trips
FROM uber_trips
GROUP BY 1
HAVING COUNT(*)>1000
In summary, the HAVING clause is a powerful tool that acts as a filter for groups of data, specifically when you're working with aggregative values. It helps you find specific groups that meet certain criteria, making it a handy tool when you need to analyze data that's been grouped together.
Cool, now let's practice! 🙇