Dataford - Ace your Data Interview

In our previous practices, we used GROUP BY clause to group and summarize data from a dataset, like Uber trip data. Let's say, you have this table uber_trips and wanna know how many trips happen every day:

Sample of uber_trips table:

ride_id	user_id	cost	trip_start_at
101	1001	$12.50	2023-01-15 8:30 AM
102	1002	$18.75	2023-01-15 9:15 AM
103	1003	$8.20	2023-01-15 10:00 AM
104	1004	$14.30	2023-01-15 10:45 AM
105	1005	$10.50	2023-01-15 11:30 AM

To get the total number of rides for each day, you will use the trip_start_at column and convert that to a date and then count the number of trips:

SELECT 
   DATE(trip_start_at) AS date,
   COUNT(*) AS total_trips
 FROM uber_trips
 GROUP BY 1

But what if you want to find specific days when something interesting happened, like days when the total number of Uber trips exceeded 1,000? This is where the HAVING clause becomes useful.

The HAVING clause is like a filter for grouped data. It allows you to filter groups based on some condition. Think of it as the "WHERE clause for aggregative values." What are aggregative values? These are values that you get when you use functions like COUNT, SUM, AVERAGE, and more to group and summarize your data.

So, in SQL, it might look like this:

SELECT 
   DATE(trip_start_at) AS date,
   COUNT(*) AS total_trips
 FROM uber_trips
 GROUP BY 1
 HAVING COUNT(*)>1000

In summary, the HAVING clause is a powerful tool that acts as a filter for groups of data, specifically when you're working with aggregative values. It helps you find specific groups that meet certain criteria, making it a handy tool when you need to analyze data that's been grouped together.

Cool, now let's practice! 🙇

Modules