Advanced SQL for Trade Analysis: Window Functions, CTEs, and Time-Series Queries
SQL is Not Just for IT: A Quant's Most Versatile Tool
In the age of Python and R, it's easy to dismiss SQL as a simple data retrieval language. This is a important mistake. Modern SQL, with its support for advanced features like window functions, Common Table Expressions (CTEs), and time-series-specific functions, is an incredibly effective and expressive language for financial data analysis. For many common analytical tasks—calculating moving averages, finding session highs and lows, computing VWAP—a well-written SQL query can be significantly more efficient and concise than the equivalent code in a procedural language like Python.
This is because SQL is a declarative language. You describe what you want, and the database's query optimizer is responsible for finding the most efficient way to get it. For large datasets, a database can often outperform a single-threaded Python script by leveraging parallelism, indexes, and intelligent data access patterns. This article will demonstrate how to use these advanced SQL features to perform complex trade analysis directly within the database.
Window Functions: The Powerhouse of Analytical SQL
Window functions are the single most important feature for time-series analysis in SQL. A window function performs a calculation across a set of table rows that are somehow related to the current row. This is similar to an aggregation function, but it does not group the result set into a single output row.
Calculating a Moving Average:
A classic example is calculating a 5-period moving average of a stock's closing price.
SELECT
trade_date,
symbol,
close_price,
AVG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg_5d
FROM
daily_prices;
SELECT
trade_date,
symbol,
close_price,
AVG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg_5d
FROM
daily_prices;
Let's break down the OVER clause:
PARTITION BY symbol: This divides the data into partitions, one for each symbol. The window function is applied independently to each partition.ORDER BY trade_date: This orders the rows within each partition by date.ROWS BETWEEN 4 PRECEDING AND CURRENT ROW: This defines the "window" or frame for the aggregation. In this case, it includes the current row and the four preceding rows.
Finding the Session High:
Window functions can also be used to find the highest price for a symbol within a trading session.
SELECT
event_timestamp,
symbol,
price,
MAX(price) OVER (PARTITION BY symbol, DATE(event_timestamp)) AS session_high
FROM
trades;
SELECT
event_timestamp,
symbol,
price,
MAX(price) OVER (PARTITION BY symbol, DATE(event_timestamp)) AS session_high
FROM
trades;
Common Table Expressions (CTEs): Organizing Complex Queries
CTEs, defined using the WITH clause, are used to break down complex queries into simple, logical, and readable steps. They act as temporary, named result sets that exist only for the duration of the query. This is invaluable for multi-stage financial calculations.
Calculating Daily Realized PnL with FIFO Logic:
Calculating PnL using FIFO logic is a complex process. A CTE can make the logic much clearer.
WITH buys AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY trade_ts) AS buy_seq
FROM transactions WHERE side = 'B'
), sells AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY trade_ts) AS sell_seq
FROM transactions WHERE side = 'S'
)
SELECT
s.symbol,
s.trade_ts AS sell_date,
s.quantity * (s.price - b.price) AS realized_pnl
FROM
sells s
JOIN
buys b ON s.symbol = b.symbol AND s.sell_seq = b.buy_seq; -- Simplified FIFO matching
WITH buys AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY trade_ts) AS buy_seq
FROM transactions WHERE side = 'B'
), sells AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY trade_ts) AS sell_seq
FROM transactions WHERE side = 'S'
)
SELECT
s.symbol,
s.trade_ts AS sell_date,
s.quantity * (s.price - b.price) AS realized_pnl
FROM
sells s
JOIN
buys b ON s.symbol = b.symbol AND s.sell_seq = b.buy_seq; -- Simplified FIFO matching
Time-Series Specific Functions and Joins
Modern time-series databases and extensions (like TimescaleDB for PostgreSQL) provide specialized functions and join types for time-series analysis.
time_bucket() for Aggregation:
The time_bucket() function is used to group time-series data into fixed time intervals.
-- Calculate 5-minute OHLCV bars from tick data
SELECT
time_bucket('5 minutes', event_timestamp) AS bar_time,
symbol,
FIRST(price, event_timestamp) AS open,
MAX(price) AS high,
MIN(price) AS low,
LAST(price, event_timestamp) AS close,
SUM(size) AS volume
FROM
trades
GROUP BY 1, 2;
-- Calculate 5-minute OHLCV bars from tick data
SELECT
time_bucket('5 minutes', event_timestamp) AS bar_time,
symbol,
FIRST(price, event_timestamp) AS open,
MAX(price) AS high,
MIN(price) AS low,
LAST(price, event_timestamp) AS close,
SUM(size) AS volume
FROM
trades
GROUP BY 1, 2;
ASOF Joins for Point-in-Time Lookups:
An ASOF join is a special type of join that matches rows from one table to the last row in another table with a timestamp less than or equal to the current row's timestamp. This is extremely useful for joining trade data with quote data to find the prevailing quote at the time of a trade.
-- ClickHouse syntax
SELECT
t.event_timestamp,
t.symbol,
t.price,
q.bid_price,
q.ask_price
FROM
trades t
ASOF LEFT JOIN
quotes q ON t.symbol = q.symbol AND t.event_timestamp >= q.event_timestamp;
-- ClickHouse syntax
SELECT
t.event_timestamp,
t.symbol,
t.price,
q.bid_price,
q.ask_price
FROM
trades t
ASOF LEFT JOIN
quotes q ON t.symbol = q.symbol AND t.event_timestamp >= q.event_timestamp;
This query finds, for each trade, the most recent quote that occurred before or at the same time as the trade. This is a effective and efficient way to perform point-in-time analysis without complex subqueries.
By mastering these advanced SQL features, a quantitative analyst can perform a huge range of complex analyses directly and efficiently within the database. This reduces the need to move large amounts of data into external analysis tools, saving time, reducing complexity, and leveraging the power of the database engine. For any serious quant, deep SQL knowledge is not optional; it is a core competency.
