Self joins are one of the most unmentioned concept in SQL but trust me, it's so powerful, and you will be using it a lot in your day to day job as a Data Analyst, and also during your next interviews. Even if you wanna flex in front of your friends!
Yes, self join means joining the table to itself. The best representation is this picture from Spiderman! 😀

We'll explore the use of self joins for calculating retention metrics in e-commerce using sample data from Etsy.
Imagine you work for Etsy, an e-commerce platform, and you want to calculate user retention metrics. Specifically, you want to know how many users from each cohort (group of users who joined in the same month) are still active in the next months.
Let's start with some sample data. We have a table called "user_activity" with columns "user_id," and "activity_date." We want to calculate the M2 to M1 retention :which means what percentage of users were active on a specific month and came back the next month.
Here's a simplified version of our data:
| user_id | activity_date |
|---|---|
| 1 | 2022-01-01 |
| 2 | 2022-01-02 |
| 1 | 2022-02-03 |
| 2 | 2022-03-02 |
| 4 | 2022-03-01 |
Here if you look at user_id = 1, they were active on January 1, 2022 and active again on the month after on February 3rd.
We will do a self join on the user_activity table on the same user_id and check if they were active the month after. Let's write it, it will make more sense:

Here is the query:
SELECT
COUNT(DISTINCT m1.user_id) AS m1_users,
COUNT(DISTINCT m2.user_id) AS m2_users,
FROM user_activity m1
LEFT JOIN user_activity m2 on m1.user_id = m2.user_id
AND DATE_TRUNC('month', m2.activity_date) = DATE_TRUNC('month', m1.activity_date) + INTERVAL '1 month'
We are joining the same table to itself on user_id, nothing crazy here. However, we want to check if that same user_id was active 1 month later, that's why we are adding an extra condition to our self join:
DATE_TRUNC('month', m2.activity_date) = DATE_TRUNC('month', m1.activity_date) + INTERVAL '1 month'
Then we are just counting the users from the first table that were activity on the M1 month. And those active on the next month which is M2.
Now if we want to get that ratio, we just need to do the ratio of m2_users divided by m1_users:
SELECT
COUNT(DISTINCT m2.user_id)/COUNT(DISTINCT m1.user_id) AS m2_m1_retention
FROM user_activity m1
LEFT JOIN user_activity m2 on m1.user_id = m2.user_id
AND DATE_TRUNC('month', m2.activity_date) = DATE_TRUNC('month', m1.activity_date) + INTERVAL '1 month'
And there you go! This self join is used a lot in many company to find the retention of users and is super powerful as we all want our users to retain on the platform!
Now, let's practice! 😍 If you were not able to fully get the self join, take some time to read it again. Trust me, it takes some time to digest!