PostgreSQL Mastery Cheat Sheet: Complete SQL Guide for Data Professionals - Dataford

Introduction

PostgreSQL is one of the most powerful and widely-used open-source relational database systems. Whether you're preparing for a data analyst interview, building analytics dashboards, or working with production databases, mastering PostgreSQL SQL is essential.

This comprehensive guide takes you from foundational SELECT statements to advanced window functions and recursive queries. Each section builds on the previous one, making it perfect for beginners who want to progress systematically, as well as intermediate users who need a quick reference.

We'll cover everything from basic filtering and sorting to complex joins, subqueries, and analytical functions. Every concept includes practical PostgreSQL examples you can run immediately. Let's dive in.

1. Basic SQL Queries

The foundation of SQL starts with SELECT statements. These queries let you retrieve and filter data from your tables. Master these basics before moving to more complex operations.

1.1 Select All Data from a Table

The simplest query retrieves every column and row from a table. Use this when you need to see all available data, but be cautious with large tables—it can return thousands of rows.

SELECT
  *
FROM
  employees;

Tip: In production, avoid SELECT * when you only need specific columns. It's less efficient and makes your intent unclear.

1.2 Select Specific Columns

Specify exactly which columns you need. This improves performance and makes your queries more readable. You can select multiple columns by separating them with commas.

SELECT
  first_name,
  last_name,
  email
FROM
  employees;

Use case: Perfect for reports where you only need specific fields, reducing data transfer and improving query speed.

1.3 Filter Records with WHERE

The WHERE clause filters rows based on conditions. You can use comparison operators (=, <, >, <=, >=, <>), logical operators (AND, OR, NOT), and pattern matching.

SELECT
  *
FROM
  employees
WHERE
  department = 'Engineering'
  AND salary > 75000;

Interview tip: WHERE filters rows before aggregation. Remember this distinction when working with GROUP BY and HAVING.

1.4 Sort Records using ORDER BY

ORDER BY sorts your results in ascending (ASC) or descending (DESC) order. You can sort by multiple columns, with each subsequent column used as a tiebreaker.

SELECT
  first_name,
  last_name,
  salary
FROM
  employees
ORDER BY
  salary DESC,
  last_name ASC;

Note: If you don't specify ASC or DESC, PostgreSQL defaults to ascending order.

1.5 Filter by Range using BETWEEN

BETWEEN is inclusive on both ends, making it perfect for filtering numeric ranges, dates, or any ordered data. It's equivalent to using >= AND <= but more readable.

SELECT
  *
FROM
  employees
WHERE
  salary BETWEEN 50000 AND 100000;

Common mistake: BETWEEN works with dates too, but remember it includes both boundary dates. For date ranges, this is usually what you want.

1.6 Find NULL Values

NULL represents missing or unknown data. You can't use = or <> to check for NULL—you must use IS NULL or IS NOT NULL. This is a common interview question.

SELECT
  *
FROM
  employees
WHERE
  manager_id IS NULL;

Why this matters: In SQL, NULL != NULL. Any comparison with NULL using = or <> returns NULL (unknown), not true or false.

1.7 Remove Duplicate Entries with DISTINCT

DISTINCT eliminates duplicate rows from your results. You can use it with a single column or multiple columns. When used with multiple columns, it returns unique combinations.

SELECT DISTINCT
  department
FROM
  employees;

Performance note: DISTINCT requires sorting, which can be slow on large tables. Consider if you really need it or if GROUP BY would be more efficient.

2. Aggregation Functions

Aggregation functions summarize data across multiple rows. They're essential for analytics, reporting, and understanding your data at a high level. These functions ignore NULL values by default.

2.1 COUNT Records

COUNT returns the number of rows. Use COUNT(*) to count all rows, or COUNT(column_name) to count non-NULL values in a specific column.

SELECT
  COUNT(*) AS total_employees
FROM
  employees;

SELECT
  COUNT(manager_id) AS employees_with_manager
FROM
  employees;

Key difference: COUNT(*) counts all rows, including those with NULLs. COUNT(column) only counts non-NULL values.

2.2 SUM Column Values

SUM adds up all values in a numeric column. It's perfect for calculating totals, revenue, quantities, or any additive metric.

SELECT
  SUM(salary) AS total_payroll
FROM
  employees;

Use case: Financial reporting, inventory totals, and any scenario where you need to aggregate numeric values.

2.3 MAX Values

MAX returns the highest value in a column. Works with numbers, dates, and strings (alphabetically). NULL values are ignored.

SELECT
  MAX(salary) AS highest_salary
FROM
  employees;

Tip: To get the row with the maximum value, use a subquery or window function—MAX alone only returns the value, not the entire row.

2.4 MIN Values

MIN returns the lowest value in a column. Like MAX, it works with various data types and ignores NULLs.

SELECT
  MIN(hire_date) AS first_hire
FROM
  employees;

Practical use: Finding earliest dates, minimum prices, or identifying outliers in your data.

2.5 AVG (Average)

AVG calculates the arithmetic mean of numeric values. It sums all values and divides by the count of non-NULL values.

SELECT
  AVG(salary) AS average_salary
FROM
  employees;

Important: AVG ignores NULLs. If you have 10 salaries and 2 are NULL, it divides the sum by 8, not 10.

3. Grouping Data

GROUP BY lets you aggregate data by categories. This is fundamental to analytics—you'll use it constantly to summarize data by department, date, region, or any categorical dimension.

3.1 GROUP BY with COUNT

Count records within each group. This is one of the most common analytical queries—answering questions like "How many employees in each department?"

SELECT
  department,
  COUNT(*) AS employee_count
FROM
  employees
GROUP BY
  department;

Rule: Every column in SELECT must either be in GROUP BY or be an aggregate function. This is a common interview question.

3.2 GROUP BY with SUM

Sum values within each group. Perfect for calculating totals by category—total sales by region, total salary by department, etc.

SELECT
  department,
  SUM(salary) AS total_payroll
FROM
  employees
GROUP BY
  department;

Analytics tip: Combine multiple aggregates in one query to get comprehensive summaries: COUNT, SUM, AVG, MIN, MAX all work together.

3.3 GROUP BY with HAVING Filters

HAVING filters groups after aggregation, while WHERE filters rows before aggregation. Use HAVING when you need to filter based on aggregate results.

SELECT
  department,
  COUNT(*) AS employee_count
FROM
  employees
GROUP BY
  department
HAVING
  COUNT(*) > 10;

Key distinction: WHERE filters rows; HAVING filters groups. You can't use aggregate functions in WHERE, but you can in HAVING. This is a critical interview topic.

4. Joining Tables

Joins combine data from multiple tables using relationships. They're essential for working with normalized databases. Understanding the different join types is crucial for data analysis.

4.1 INNER JOIN

INNER JOIN returns only rows where there's a match in both tables. If a row in one table has no matching row in the other, it's excluded. This is the most common join type.

SELECT
  e.first_name,
  e.last_name,
  d.department_name
FROM
  employees e
  INNER JOIN departments d ON e.department_id = d.id;

Concept: Think of INNER JOIN as the intersection of two sets—only matching records appear in results.

4.2 LEFT JOIN

LEFT JOIN returns all rows from the left table, plus matching rows from the right table. If there's no match, NULL values fill the right table columns. Essential for finding "orphaned" records.

SELECT
  e.first_name,
  e.last_name,
  d.department_name
FROM
  employees e
  LEFT JOIN departments d ON e.department_id = d.id;

Use case: Find employees without departments, or orders without customers. The left table is preserved completely.

4.3 RIGHT JOIN

RIGHT JOIN returns all rows from the right table, plus matching rows from the left table. It's the mirror of LEFT JOIN. In practice, RIGHT JOIN is less common—most developers prefer LEFT JOIN and just swap table order.

SELECT
  e.first_name,
  e.last_name,
  d.department_name
FROM
  employees e
  RIGHT JOIN departments d ON e.department_id = d.id;

Tip: RIGHT JOIN can be rewritten as LEFT JOIN by swapping tables. Use whichever makes your query more readable.

4.4 FULL OUTER JOIN

FULL OUTER JOIN returns all rows from both tables. Where there's a match, values are combined. Where there's no match, NULLs fill the missing side. It's the union of both tables.

SELECT
  e.first_name,
  e.last_name,
  d.department_name
FROM
  employees e
  FULL OUTER JOIN departments d ON e.department_id = d.id;

When to use: Data reconciliation, finding mismatches between systems, or when you need a complete picture from both tables.

5. Subqueries

Subqueries (nested queries) are queries within queries. They can appear in WHERE, SELECT, or FROM clauses. While joins are often preferred for performance, subqueries can be more readable for complex logic.

5.1 Subquery in WHERE Clause

Use a subquery in WHERE to filter based on results from another query. Common with comparison operators like IN, EXISTS, or comparison operators.

SELECT
  *
FROM
  employees
WHERE
  department_id IN (
    SELECT
      id
    FROM
      departments
    WHERE
      location = 'San Francisco'
  );

Performance note: Subqueries in WHERE can sometimes be slower than joins. For large datasets, test both approaches. EXISTS is often faster than IN for large subqueries.

5.2 Subquery in SELECT Clause

A subquery in SELECT acts like a calculated column. It must return a single value (scalar subquery). Useful for adding related data without joining.

SELECT
  first_name,
  last_name,
  salary,
  (
    SELECT
      AVG(salary)
    FROM
      employees
  ) AS company_avg_salary
FROM
  employees;

Warning: Scalar subqueries execute for every row, which can be slow. Use judiciously on large tables.

5.3 Subquery in FROM Clause

A subquery in FROM (derived table) lets you treat query results as a temporary table. Often used with aliases and combined with other tables.

SELECT
  dept_stats.department,
  dept_stats.avg_salary
FROM
  (
    SELECT
      department,
      AVG(salary) AS avg_salary
    FROM
      employees
    GROUP BY
      department
  ) AS dept_stats
WHERE
  dept_stats.avg_salary > 80000;

Alternative: Common Table Expressions (CTEs) with WITH are often more readable than FROM subqueries for complex logic.

6. Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY does. They're powerful for analytics, rankings, and running calculations.

6.1 RANK() OVER (PARTITION BY)

RANK() assigns ranks to rows within partitions. Rows with equal values get the same rank, and the next rank is skipped. PARTITION BY creates separate ranking groups.

SELECT
  first_name,
  last_name,
  department,
  salary,
  RANK() OVER (
    PARTITION BY
      department
    ORDER BY
      salary DESC
  ) AS dept_rank
FROM
  employees;

Variants: DENSE_RANK() doesn't skip ranks, and ROW_NUMBER() assigns unique sequential numbers even for ties.

6.2 Running Totals using SUM() OVER

SUM() OVER calculates cumulative sums. Use ORDER BY in the OVER clause to create running totals. Perfect for time series analysis and cumulative metrics.

SELECT
  order_date,
  revenue,
  SUM(revenue) OVER (
    ORDER BY
      order_date
  ) AS running_total
FROM
  orders;

Frame specification: You can use ROWS BETWEEN to control the window frame—like "sum of last 7 days" using ROWS BETWEEN 6 PRECEDING AND CURRENT ROW.

6.3 LAG() Function

LAG() accesses data from a previous row without a self-join. It's perfect for calculating period-over-period changes, day-over-day growth, or comparing current values to previous ones.

SELECT
  order_date,
  revenue,
  LAG(revenue, 1) OVER (
    ORDER BY
      order_date
  ) AS previous_revenue,
  revenue - LAG(revenue, 1) OVER (
    ORDER BY
      order_date
  ) AS revenue_change
FROM
  orders;

Related: LEAD() does the opposite—accesses data from the next row. Both are essential for time series analysis.

7. String Functions

PostgreSQL provides powerful string manipulation functions. These are essential for data cleaning, formatting, and text analysis.

7.1 CONCAT

CONCAT combines multiple strings into one. It handles NULL values gracefully by treating them as empty strings, unlike the || operator which returns NULL if any operand is NULL.

SELECT
  CONCAT(first_name, ' ', last_name) AS full_name
FROM
  employees;

Alternative: You can also use the || operator: first_name || ' ' || last_name, but CONCAT is more NULL-safe.

7.2 LENGTH

LENGTH returns the number of characters in a string. Useful for validation, filtering, or formatting purposes.

SELECT
  email,
  LENGTH(email) AS email_length
FROM
  employees
WHERE
  LENGTH(email) > 20;

Note: For byte length, use OCTET_LENGTH(). For multibyte characters, LENGTH counts characters, not bytes.

7.3 SUBSTRING

SUBSTRING extracts a portion of a string. Specify the start position (1-indexed) and optionally the length. Essential for parsing and data extraction.

SELECT
  email,
  SUBSTRING(
    email
    FROM
      1 FOR POSITION('@' IN email) - 1
  ) AS username
FROM
  employees;

PostgreSQL syntax: You can use SUBSTRING(str FROM start FOR length) or SUBSTRING(str, start, length). Both work.

7.4 UPPER / LOWER

UPPER converts text to uppercase, LOWER converts to lowercase. Essential for case-insensitive comparisons and data standardization.

SELECT
  UPPER(first_name) AS first_name_upper,
  LOWER(email) AS email_lower
FROM
  employees;

Use case: Data cleaning, case-insensitive searches, and ensuring consistent formatting in reports.

8. Date and Time Functions

Working with dates and times is crucial for analytics. PostgreSQL provides robust date/time functions for extraction, arithmetic, and formatting.

8.1 CURRENT_DATE

CURRENT_DATE returns today's date. Useful for filtering recent records, calculating ages, or date-based comparisons.

SELECT
  *
FROM
  orders
WHERE
  order_date = CURRENT_DATE;

Related: CURRENT_TIMESTAMP returns date and time, CURRENT_TIME returns just the time. All are timezone-aware.

8.2 EXTRACT (YEAR)

EXTRACT pulls specific parts from dates: year, month, day, hour, minute, etc. Essential for time-based grouping and analysis.

SELECT
  EXTRACT(
    YEAR
    FROM
      hire_date
  ) AS hire_year,
  EXTRACT(
    MONTH
    FROM
      hire_date
  ) AS hire_month
FROM
  employees;

Common extractions: YEAR, MONTH, DAY, DOW (day of week), QUARTER, WEEK. Use for time-based aggregations.

8.3 Date Difference using age()

AGE() calculates the interval between two dates, returning years, months, and days. Perfect for calculating ages, tenures, or time between events.

SELECT
  first_name,
  hire_date,
  AGE (CURRENT_DATE, hire_date) AS tenure
FROM
  employees;

Tip: To get just the years, use EXTRACT(YEAR FROM AGE(...)). AGE returns an INTERVAL type.

8.4 Date Arithmetic with INTERVAL

Add or subtract intervals from dates. INTERVAL handles years, months, days, hours, minutes, and seconds. Essential for date calculations.

SELECT
  order_date,
  order_date + INTERVAL '30 days' AS delivery_date,
  order_date - INTERVAL '1 year' AS one_year_ago
FROM
  orders;

Common intervals: '1 day', '1 month', '1 year', '2 weeks', '3 hours'. PostgreSQL handles month/year arithmetic intelligently.

9. Advanced SQL

These advanced features handle complex logic, data transformation, and specialized use cases. Master these to become a SQL expert.

9.1 CASE Statements

CASE is SQL's if-then-else logic. Use it for conditional logic, data categorization, and creating derived columns based on conditions.

SELECT
  first_name,
  salary,
  CASE
    WHEN salary > 100000 THEN 'High'
    WHEN salary > 50000 THEN 'Medium'
    ELSE 'Low'
  END AS salary_category
FROM
  employees;

Two forms: Simple CASE (CASE column WHEN value) and searched CASE (CASE WHEN condition). Searched CASE is more flexible.

9.2 COALESCE

COALESCE returns the first non-NULL value from a list. Perfect for providing default values, handling missing data, and data cleaning.

SELECT
  first_name,
  COALESCE(manager_id, 0) AS manager_id_or_zero,
  COALESCE(phone, email, 'No contact') AS contact_info
FROM
  employees;

Related: NULLIF(value1, value2) returns NULL if values match, otherwise value1. Useful for data normalization.

9.3 CAST / Type Conversion

CAST converts data from one type to another. Essential when you need to change data types for calculations, comparisons, or formatting.

SELECT
  CAST('123' AS INTEGER) AS string_to_int,
  CAST(salary AS TEXT) AS salary_text,
  '2024-01-15'::DATE AS string_to_date
FROM
  employees;

PostgreSQL shorthand: Use :: for casting: '123'::INTEGER. Both CAST() and :: work identically.

9.4 Recursive CTEs

Recursive Common Table Expressions solve hierarchical problems like organizational charts, category trees, or graph traversals. They reference themselves to build results iteratively.

WITH RECURSIVE
  employee_hierarchy AS (
    SELECT
      id,
      first_name,
      manager_id,
      1 AS level
    FROM
      employees
    WHERE
      manager_id IS NULL
    UNION ALL
    SELECT
      e.id,
      e.first_name,
      e.manager_id,
      eh.level + 1
    FROM
      employees e
      INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id
  )
SELECT
  *
FROM
  employee_hierarchy;

Warning: Recursive CTEs can be slow and memory-intensive. Always include a termination condition to prevent infinite loops.

9.5 CROSS JOIN

CROSS JOIN creates a Cartesian product—every row from the first table paired with every row from the second. Use with caution: it can generate massive result sets.

SELECT
  d.department_name,
  m.month_name
FROM
  departments d
  CROSS JOIN months m;

Use case: Generating combinations, creating complete time series (all dates × all categories), or building test data.

9.6 CROSS JOIN with Filters

Combine CROSS JOIN with WHERE to create controlled combinations. This is useful for generating specific combinations or filling gaps in data.

SELECT
  d.department_name,
  m.month_name
FROM
  departments d
  CROSS JOIN months m
WHERE
  m.month_name IN ('January', 'February', 'March');

Performance tip: Always filter CROSS JOIN results when possible. Unfiltered CROSS JOINs can create millions of rows from small tables.

Conclusion

You've now covered PostgreSQL SQL from basic SELECT statements to advanced recursive CTEs and window functions. This cheat sheet provides a solid foundation, but mastery comes through practice.

Key takeaways:

Start with basic queries (SELECT, WHERE, ORDER BY) before moving to joins and aggregations
Understand the difference between WHERE (row filtering) and HAVING (group filtering)
Master joins—they're fundamental to working with relational data
Window functions are powerful for analytics without collapsing rows
Practice with real datasets to internalize these concepts

Ready to put your PostgreSQL skills to the test? Practice with real interview questions or explore our SQL modules to deepen your understanding. Whether you're preparing for a data analyst interview or building production analytics, these SQL fundamentals will serve you well.

📥 Download the Complete Cheat Sheet