If I could teach an analytics engineer exactly one thing beyond the basics, it would be window functions. They're the feature that turns SQL from "fetch and aggregate" into "compute over context" — rankings, period-over-period changes, running totals, moving averages — all without leaving the database or collapsing your rows. And yet most people use ROW_NUMBER() to dedup and stop there, never touching the frame clause, which is where half the power lives and most of the bugs hide.
Here's the practical whole of it. The window functions explainer below is the best way to build intuition — pick a function, then hover any result row to see exactly which input rows its window frame covers and why. I'll refer back to it.
Loading artifact: sql-window-functions...
The one idea: aggregate without collapsing#
GROUP BY computes an aggregate and throws your rows away — five orders become one row with a SUM. A window function computes across a set of related rows (the window) while keeping every original row. You get the total and the detail, side by side.
The whole thing hangs off the OVER clause, which has three parts:
SUM(amount) OVER (
PARTITION BY customer_id -- reset the calculation per group
ORDER BY order_date -- order rows within the partition
ROWS BETWEEN ... -- the frame: which rows are in scope
)PARTITION BY is "group, but don't collapse." ORDER BY sequences rows inside each partition. The frame narrows it to a sliding window of rows. Master those three and everything below is just picking a function to put in front.
Ranking: pick the right one of three#
Three functions rank rows, and the difference between them is exactly how they treat ties.
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num, -- 1,2,3,4 — always unique
RANK() OVER (ORDER BY score DESC) AS rank, -- 1,2,2,4 — ties share, gaps after
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank -- 1,2,2,3 — ties share, no gaps
FROM players;ROW_NUMBER— arbitrary unique number per row. The dedup workhorse: partition by a key, order by recency, keeprow_num = 1.RANK— ties get the same rank, then the next rank skips (two 2nds, no 3rd). This is leaderboard ranking.DENSE_RANK— ties share, no gaps. Use when you want "the top 3 distinct scores" rather than the top 3 rows.
The dedup pattern is worth memorising because you'll write it constantly:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY updated_at DESC) AS rn
FROM contacts
) t
WHERE rn = 1; -- the latest row per email, duplicates goneOffset: LAG and LEAD for period-over-period#
LAG reaches back to a previous row; LEAD reaches forward. This is how you compute change over time without a self-join — month-over-month growth is two lines:
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
FROM monthly_revenue;Anytime you catch yourself joining a table to itself on date = date - 1, stop — it's a LAG.
The frame clause, and the gotcha that bites everyone#
Now the part people skip. When you use a window aggregate with ORDER BY, you've implicitly created a frame — and the default is not what most people assume.
A running total looks like this:
SUM(amount) OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)That accumulates from the first row through the current one. Good. But here's the trap: if you omit the frame, SQL defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, not ROWS. With RANGE, all rows that share the same ORDER BY value are treated as one peer group and collapsed into the same frame — so if two orders share a date, your "running total" jumps by both at once on each of those rows instead of stepping through them.
A running total that's mysteriously wrong on dates with duplicate values is almost always the RANGE-vs-ROWS default. When you want a true row-by-row accumulation, write ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly. Don't rely on the default — it's RANGE, and it will surprise you.
A moving average is the same machinery with a bounded frame — a trailing 7-row window:
AVG(value) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_avgHover the rows in the explainer with a moving average selected and you'll see the frame slide — that's the whole concept made literal, and it's worth thirty seconds.
Why this is the highest-leverage SQL you can learn#
Window functions are where a lot of pipeline logic that people push into Python or a BI tool actually belongs — in the query, close to the data, expressed once. Dedup at load time. Sessionization. Cohort retention. Rolling metrics. The same engines I lean on make this cheap: once dlt has landed clean relational tables in DuckDB, this is the analytical layer that sits on top, and DuckDB runs window functions fast.
Learn the three-part OVER clause, memorise the dedup and LAG patterns, and — above all — write your frames explicitly. The default will not do what you think, and a silently-wrong running total is the kind of bug that ships to a dashboard and lives there for months.