Dataford - Ace your Data Interview

Here's an easy-to-understand table that illustrates the key differences between SQL functions: ROW_NUMBER, RANK, and DENSE_RANK:

ROW_NUMBER: This function assigns a unique sequential number (like 1,2,3,4) to each row, regardless of whether some rows have identical values.
RANK: This one's a bit more fair. If there are rows with equal values, RANK assigns them the same number. If 2 rows both have the value 200, they'll each get a rank of 2. However, the next value, 300, will jump to rank 4, reflecting its position in the overall sequence.
DENSE_RANK: Similar to RANK in dealing with ties, DENSE_RANK doesn't leave gaps in the ranking sequence after such ties.

The logic of the query is the same as the row_number, and the SQL is pretty straightforward:

SELECT
    number,
    ROW_NUMBER() OVER (ORDER BY number) AS row_number,
    RANK() OVER (ORDER BY number) AS rank,
    DENSE_RANK() OVER (ORDER BY number) AS dense_rank
FROM numbers

Now, let's go over the salary example, because we love salaries right!

If we want to add 2 new columns, one using rank and the other using dense_rank. Try it yourself!

Brendan and Karim, both earning a top salary of $190k, will share a rank of 1. The next highest salary (Sammy) will get a rank of 3 (RANK function skips a number after a tie). However, DENSE_RANK would assign the following salary a rank of 2, because dense_rank doesn't skip the sequence.

Here is the SQL using the salaries table:

SELECT
    user_id,
    name,
    job_title,
    salary,
    RANK() OVER(ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER(ORDER BY salary DESC) AS dense_rank
FROM salaries

We use a descending order here to rank the highest salaries.

But what about finding the lowest salaries within each job title?

For data analysts, the lowest salary is $120k, shared by Lamine and Yamal. Both receive a rank and dense_rank of 1. The next salary steps up to rank 3 (for RANK) and 2 (for DENSE_RANK).

The SQL query will just need to have a partition by job_title:

SELECT
    user_id,
    name,
    job_title,
    salary,
    RANK() OVER(PARTITION BY salary ORDER BY salary) AS rank,
    DENSE_RANK() OVER(PARTITION BY salary ORDER BY salary) AS dense_rank
FROM salaries

And we are not using DESC, because we want the lowest salaries first.

Now, you understand the difference between RANK, DENSE_RANK and ROW_NUMBER! 🎉🎉🎉🎉

Modules