PostgreSQL Mastery Cheat Sheet: Complete SQL Guide for Data Professionals
From basic queries to advanced window functions—everything you need to master PostgreSQL SQL
Introduction
PostgreSQL is one of the most powerful and widely-used open-source relational database systems. Whether you're preparing for a data analyst interview, building analytics dashboards, or working with production databases, mastering PostgreSQL SQL is essential.
This comprehensive guide takes you from foundational SELECT statements to advanced window functions and recursive queries. Each section builds on the previous one, making it perfect for beginners who want to progress systematically, as well as intermediate users who need a quick reference.
We'll cover everything from basic filtering and sorting to complex joins, subqueries, and analytical functions. Every concept includes practical PostgreSQL examples you can run immediately. Let's dive in.
1. Basic SQL Queries
The foundation of SQL starts with SELECT statements. These queries let you retrieve and filter data from your tables. Master these basics before moving to more complex operations.
1.1 Select All Data from a Table
The simplest query retrieves every column and row from a table. Use this when you need to see all available data, but be cautious with large tables—it can return thousands of rows.
SELECT
*
FROM
employees;
Tip: In production, avoid SELECT * when you only need specific columns. It's less efficient and makes your intent unclear.
1.2 Select Specific Columns
Specify exactly which columns you need. This improves performance and makes your queries more readable. You can select multiple columns by separating them with commas.
SELECT
first_name,
last_name,
email
FROM
employees;
Use case: Perfect for reports where you only need specific fields, reducing data transfer and improving query speed.
1.3 Filter Records with WHERE
The WHERE clause filters rows based on conditions. You can use comparison operators (=, <, >, <=, >=, <>), logical operators (AND, OR, NOT), and pattern matching.
SELECT
*
FROM
employees
WHERE
department = 'Engineering'
AND salary > 75000;
Interview tip: WHERE filters rows before aggregation. Remember this distinction when working with GROUP BY and HAVING.
1.4 Sort Records using ORDER BY
ORDER BY sorts your results in ascending (ASC) or descending (DESC) order. You can sort by multiple columns, with each subsequent column used as a tiebreaker.
SELECT
first_name,
last_name,
salary
FROM
employees
ORDER BY
salary DESC,
last_name ASC;
Note: If you don't specify ASC or DESC, PostgreSQL defaults to ascending order.
1.5 Filter by Range using BETWEEN
BETWEEN is inclusive on both ends, making it perfect for filtering numeric ranges, dates, or any ordered data. It's equivalent to using >= AND <= but more readable.
SELECT
*
FROM
employees
WHERE
salary BETWEEN 50000 AND 100000;
Common mistake: BETWEEN works with dates too, but remember it includes both boundary dates. For date ranges, this is usually what you want.
1.6 Find NULL Values
NULL represents missing or unknown data. You can't use = or <> to check for NULL—you must use IS NULL or IS NOT NULL. This is a common interview question.
SELECT
*
FROM
employees
WHERE
manager_id IS NULL;
Why this matters: In SQL, NULL != NULL. Any comparison with NULL using = or <> returns NULL (unknown), not true or false.
1.7 Remove Duplicate Entries with DISTINCT
DISTINCT eliminates duplicate rows from your results. You can use it with a single column or multiple columns. When used with multiple columns, it returns unique combinations.
SELECT DISTINCT
department
FROM
employees;
Performance note: DISTINCT requires sorting, which can be slow on large tables. Consider if you really need it or if GROUP BY would be more efficient.
2. Aggregation Functions
Aggregation functions summarize data across multiple rows. They're essential for analytics, reporting, and understanding your data at a high level. These functions ignore NULL values by default.
2.1 COUNT Records
COUNT returns the number of rows. Use COUNT(*) to count all rows, or COUNT(column_name) to count non-NULL values in a specific column.
SELECT
COUNT(*) AS total_employees
FROM
employees;
SELECT
COUNT(manager_id) AS employees_with_manager
FROM
employees;
Key difference: COUNT(*) counts all rows, including those with NULLs. COUNT(column) only counts non-NULL values.
2.2 SUM Column Values
SUM adds up all values in a numeric column. It's perfect for calculating totals, revenue, quantities, or any additive metric.
SELECT
SUM(salary) AS total_payroll
FROM
employees;
Use case: Financial reporting, inventory totals, and any scenario where you need to aggregate numeric values.
2.3 MAX Values
MAX returns the highest value in a column. Works with numbers, dates, and strings (alphabetically). NULL values are ignored.
SELECT
MAX(salary) AS highest_salary
FROM
employees;
Tip: To get the row with the maximum value, use a subquery or window function—MAX alone only returns the value, not the entire row.
2.4 MIN Values
MIN returns the lowest value in a column. Like MAX, it works with various data types and ignores NULLs.
SELECT
MIN(hire_date) AS first_hire
FROM
employees;
Practical use: Finding earliest dates, minimum prices, or identifying outliers in your data.
2.5 AVG (Average)
AVG calculates the arithmetic mean of numeric values. It sums all values and divides by the count of non-NULL values.
SELECT
AVG(salary) AS average_salary
FROM
employees;
Important: AVG ignores NULLs. If you have 10 salaries and 2 are NULL, it divides the sum by 8, not 10.
3. Grouping Data
GROUP BY lets you aggregate data by categories. This is fundamental to analytics—you'll use it constantly to summarize data by department, date, region, or any categorical dimension.
3.1 GROUP BY with COUNT
Count records within each group. This is one of the most common analytical queries—answering questions like "How many employees in each department?"
SELECT
department,
COUNT(*) AS employee_count
FROM
employees
GROUP BY
department;
Rule: Every column in SELECT must either be in GROUP BY or be an aggregate function. This is a common interview question.
3.2 GROUP BY with SUM
Sum values within each group. Perfect for calculating totals by category—total sales by region, total salary by department, etc.
SELECT
department,
SUM(salary) AS total_payroll
FROM
employees
GROUP BY
department;
Analytics tip: Combine multiple aggregates in one query to get comprehensive summaries: COUNT, SUM, AVG, MIN, MAX all work together.
3.3 GROUP BY with HAVING Filters
HAVING filters groups after aggregation, while WHERE filters rows before aggregation. Use HAVING when you need to filter based on aggregate results.
SELECT
department,
COUNT(*) AS employee_count
FROM
employees
GROUP BY
department
HAVING
COUNT(*) > 10;
Key distinction: WHERE filters rows; HAVING filters groups. You can't use aggregate functions in WHERE, but you can in HAVING. This is a critical interview topic.
4. Joining Tables
Joins combine data from multiple tables using relationships. They're essential for working with normalized databases. Understanding the different join types is crucial for data analysis.
4.1 INNER JOIN
INNER JOIN returns only rows where there's a match in both tables. If a row in one table has no matching row in the other, it's excluded. This is the most common join type.
SELECT
e.first_name,
e.last_name,
d.department_name
FROM
employees e
INNER JOIN departments d ON e.department_id = d.id;
Concept: Think of INNER JOIN as the intersection of two sets—only matching records appear in results.
4.2 LEFT JOIN
LEFT JOIN returns all rows from the left table, plus matching rows from the right table. If there's no match, NULL values fill the right table columns. Essential for finding "orphaned" records.
SELECT
e.first_name,
e.last_name,
d.department_name
FROM
employees e
LEFT JOIN departments d ON e.department_id = d.id;
Use case: Find employees without departments, or orders without customers. The left table is preserved completely.
4.3 RIGHT JOIN
RIGHT JOIN returns all rows from the right table, plus matching rows from the left table. It's the mirror of LEFT JOIN. In practice, RIGHT JOIN is less common—most developers prefer LEFT JOIN and just swap table order.
SELECT
e.first_name,
e.last_name,
d.department_name
FROM
employees e
RIGHT JOIN departments d ON e.department_id = d.id;
Tip: RIGHT JOIN can be rewritten as LEFT JOIN by swapping tables. Use whichever makes your query more readable.
4.4 FULL OUTER JOIN
FULL OUTER JOIN returns all rows from both tables. Where there's a match, values are combined. Where there's no match, NULLs fill the missing side. It's the union of both tables.
SELECT
e.first_name,
e.last_name,
d.department_name
FROM
employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;
When to use: Data reconciliation, finding mismatches between systems, or when you need a complete picture from both tables.
5. Subqueries
Subqueries (nested queries) are queries within queries. They can appear in WHERE, SELECT, or FROM clauses. While joins are often preferred for performance, subqueries can be more readable for complex logic.
5.1 Subquery in WHERE Clause
Use a subquery in WHERE to filter based on results from another query. Common with comparison operators like IN, EXISTS, or comparison operators.
SELECT
*
FROM
employees
WHERE
department_id IN (
SELECT
id
FROM
departments
WHERE
location = 'San Francisco'
);
Performance note: Subqueries in WHERE can sometimes be slower than joins. For large datasets, test both approaches. EXISTS is often faster than IN for large subqueries.
5.2 Subquery in SELECT Clause
A subquery in SELECT acts like a calculated column. It must return a single value (scalar subquery). Useful for adding related data without joining.
SELECT
first_name,
last_name,
salary,
(
SELECT
AVG(salary)
FROM
employees
) AS company_avg_salary
FROM
employees;
Warning: Scalar subqueries execute for every row, which can be slow. Use judiciously on large tables.
5.3 Subquery in FROM Clause
A subquery in FROM (derived table) lets you treat query results as a temporary table. Often used with aliases and combined with other tables.
SELECT
dept_stats.department,
dept_stats.avg_salary
FROM
(
SELECT
department,
AVG(salary) AS avg_salary
FROM
employees
GROUP BY
department
) AS dept_stats
WHERE
dept_stats.avg_salary > 80000;
Alternative: Common Table Expressions (CTEs) with WITH are often more readable than FROM subqueries for complex logic.
6. Window Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY does. They're powerful for analytics, rankings, and running calculations.
6.1 RANK() OVER (PARTITION BY)
RANK() assigns ranks to rows within partitions. Rows with equal values get the same rank, and the next rank is skipped. PARTITION BY creates separate ranking groups.
SELECT
first_name,
last_name,
department,
salary,
RANK() OVER (
PARTITION BY
department
ORDER BY
salary DESC
) AS dept_rank
FROM
employees;
Variants: DENSE_RANK() doesn't skip ranks, and ROW_NUMBER() assigns unique sequential numbers even for ties.
6.2 Running Totals using SUM() OVER
SUM() OVER calculates cumulative sums. Use ORDER BY in the OVER clause to create running totals. Perfect for time series analysis and cumulative metrics.
SELECT
order_date,
revenue,
SUM(revenue) OVER (
ORDER BY
order_date
) AS running_total
FROM
orders;
Frame specification: You can use ROWS BETWEEN to control the window frame—like "sum of last 7 days" using ROWS BETWEEN 6 PRECEDING AND CURRENT ROW.
6.3 LAG() Function
LAG() accesses data from a previous row without a self-join. It's perfect for calculating period-over-period changes, day-over-day growth, or comparing current values to previous ones.
SELECT
order_date,
revenue,
LAG(revenue, 1) OVER (
ORDER BY
order_date
) AS previous_revenue,
revenue - LAG(revenue, 1) OVER (
ORDER BY
order_date
) AS revenue_change
FROM
orders;
Related: LEAD() does the opposite—accesses data from the next row. Both are essential for time series analysis.
7. String Functions
PostgreSQL provides powerful string manipulation functions. These are essential for data cleaning, formatting, and text analysis.
7.1 CONCAT
CONCAT combines multiple strings into one. It handles NULL values gracefully by treating them as empty strings, unlike the || operator which returns NULL if any operand is NULL.
SELECT
CONCAT(first_name, ' ', last_name) AS full_name
FROM
employees;
Alternative: You can also use the || operator: first_name || ' ' || last_name, but CONCAT is more NULL-safe.
7.2 LENGTH
LENGTH returns the number of characters in a string. Useful for validation, filtering, or formatting purposes.
SELECT
email,
LENGTH(email) AS email_length
FROM
employees
WHERE
LENGTH(email) > 20;
Note: For byte length, use OCTET_LENGTH(). For multibyte characters, LENGTH counts characters, not bytes.
7.3 SUBSTRING
SUBSTRING extracts a portion of a string. Specify the start position (1-indexed) and optionally the length. Essential for parsing and data extraction.
SELECT
email,
SUBSTRING(
email
FROM
1 FOR POSITION('@' IN email) - 1
) AS username
FROM
employees;
PostgreSQL syntax: You can use SUBSTRING(str FROM start FOR length) or SUBSTRING(str, start, length). Both work.
7.4 UPPER / LOWER
UPPER converts text to uppercase, LOWER converts to lowercase. Essential for case-insensitive comparisons and data standardization.
SELECT
UPPER(first_name) AS first_name_upper,
LOWER(email) AS email_lower
FROM
employees;
Use case: Data cleaning, case-insensitive searches, and ensuring consistent formatting in reports.
8. Date and Time Functions
Working with dates and times is crucial for analytics. PostgreSQL provides robust date/time functions for extraction, arithmetic, and formatting.
8.1 CURRENT_DATE
CURRENT_DATE returns today's date. Useful for filtering recent records, calculating ages, or date-based comparisons.
SELECT
*
FROM
orders
WHERE
order_date = CURRENT_DATE;
Related: CURRENT_TIMESTAMP returns date and time, CURRENT_TIME returns just the time. All are timezone-aware.
8.2 EXTRACT (YEAR)
EXTRACT pulls specific parts from dates: year, month, day, hour, minute, etc. Essential for time-based grouping and analysis.
SELECT
EXTRACT(
YEAR
FROM
hire_date
) AS hire_year,
EXTRACT(
MONTH
FROM
hire_date
) AS hire_month
FROM
employees;
Common extractions: YEAR, MONTH, DAY, DOW (day of week), QUARTER, WEEK. Use for time-based aggregations.
8.3 Date Difference using age()
AGE() calculates the interval between two dates, returning years, months, and days. Perfect for calculating ages, tenures, or time between events.
SELECT
first_name,
hire_date,
AGE (CURRENT_DATE, hire_date) AS tenure
FROM
employees;
Tip: To get just the years, use EXTRACT(YEAR FROM AGE(...)). AGE returns an INTERVAL type.
8.4 Date Arithmetic with INTERVAL
Add or subtract intervals from dates. INTERVAL handles years, months, days, hours, minutes, and seconds. Essential for date calculations.
SELECT
order_date,
order_date + INTERVAL '30 days' AS delivery_date,
order_date - INTERVAL '1 year' AS one_year_ago
FROM
orders;
Common intervals: '1 day', '1 month', '1 year', '2 weeks', '3 hours'. PostgreSQL handles month/year arithmetic intelligently.
9. Advanced SQL
These advanced features handle complex logic, data transformation, and specialized use cases. Master these to become a SQL expert.
9.1 CASE Statements
CASE is SQL's if-then-else logic. Use it for conditional logic, data categorization, and creating derived columns based on conditions.
SELECT
first_name,
salary,
CASE
WHEN salary > 100000 THEN 'High'
WHEN salary > 50000 THEN 'Medium'
ELSE 'Low'
END AS salary_category
FROM
employees;
Two forms: Simple CASE (CASE column WHEN value) and searched CASE (CASE WHEN condition). Searched CASE is more flexible.
9.2 COALESCE
COALESCE returns the first non-NULL value from a list. Perfect for providing default values, handling missing data, and data cleaning.
SELECT
first_name,
COALESCE(manager_id, 0) AS manager_id_or_zero,
COALESCE(phone, email, 'No contact') AS contact_info
FROM
employees;
Related: NULLIF(value1, value2) returns NULL if values match, otherwise value1. Useful for data normalization.
9.3 CAST / Type Conversion
CAST converts data from one type to another. Essential when you need to change data types for calculations, comparisons, or formatting.
SELECT
CAST('123' AS INTEGER) AS string_to_int,
CAST(salary AS TEXT) AS salary_text,
'2024-01-15'::DATE AS string_to_date
FROM
employees;
PostgreSQL shorthand: Use :: for casting: '123'::INTEGER. Both CAST() and :: work identically.
9.4 Recursive CTEs
Recursive Common Table Expressions solve hierarchical problems like organizational charts, category trees, or graph traversals. They reference themselves to build results iteratively.
WITH RECURSIVE
employee_hierarchy AS (
SELECT
id,
first_name,
manager_id,
1 AS level
FROM
employees
WHERE
manager_id IS NULL
UNION ALL
SELECT
e.id,
e.first_name,
e.manager_id,
eh.level + 1
FROM
employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT
*
FROM
employee_hierarchy;
Warning: Recursive CTEs can be slow and memory-intensive. Always include a termination condition to prevent infinite loops.
9.5 CROSS JOIN
CROSS JOIN creates a Cartesian product—every row from the first table paired with every row from the second. Use with caution: it can generate massive result sets.
SELECT
d.department_name,
m.month_name
FROM
departments d
CROSS JOIN months m;
Use case: Generating combinations, creating complete time series (all dates Ă— all categories), or building test data.
9.6 CROSS JOIN with Filters
Combine CROSS JOIN with WHERE to create controlled combinations. This is useful for generating specific combinations or filling gaps in data.
SELECT
d.department_name,
m.month_name
FROM
departments d
CROSS JOIN months m
WHERE
m.month_name IN ('January', 'February', 'March');
Performance tip: Always filter CROSS JOIN results when possible. Unfiltered CROSS JOINs can create millions of rows from small tables.
Conclusion
You've now covered PostgreSQL SQL from basic SELECT statements to advanced recursive CTEs and window functions. This cheat sheet provides a solid foundation, but mastery comes through practice.
Key takeaways:
- Start with basic queries (SELECT, WHERE, ORDER BY) before moving to joins and aggregations
- Understand the difference between WHERE (row filtering) and HAVING (group filtering)
- Master joins—they're fundamental to working with relational data
- Window functions are powerful for analytics without collapsing rows
- Practice with real datasets to internalize these concepts
Ready to put your PostgreSQL skills to the test? Practice with real interview questions or explore our SQL modules to deepen your understanding. Whether you're preparing for a data analyst interview or building production analytics, these SQL fundamentals will serve you well.