Now that my ColdFusion blog is running on MySQL 8.0.28, I get to leverage a number of new features related to derived tables. Yesterday, I looked as using
LATERAL derived tables to gather row-specific data. This morning, I want to look at using something called Common Table Expressions (CTE) in order to create derived tables that are factored-out of the main SQL query and placed within a named, temporary result set. This doesn’t change the derived table, functionally speaking; but, I believe it does make the SQL query easier to read.
To explore this, let’s once again consider the relationship between members and comments on this blog. This is a one-to-many relationship in that one member can leave many comments. If I wanted to query the MySQL database for a specific member and get some aggregate information about the comments that the member has left, I could create a derived table with the aggregate information and then
INNER JOIN that derived table to the
SET @userID = 1; SELECT m.id, m.name, -- Gather the aggregate data from the derived stats table. stats.commentCount, stats.minCreatedAt, stats.maxCreatedAt FROM member m INNER JOIN -- Generate a DERIVED TABLE with multiple aggregations relating to the -- commenting for the given user. This allows us to run the related query -- "once for the member" instead of "once for each aggregate". ( SELECT c.member_id, COUNT( * ) AS commentCount, MIN( c.date_created ) AS minCreatedAt, MAX( c.date_created ) AS maxCreatedAt FROM blog_comment c -- NOTE: Since the derived query CANNOT reference the outer query - it -- has to be a constant evaluation - I have to repeat the filtering in -- the derived query in order to avoid a FULL TABLE SCAN. WHERE c.member_id = @userID GROUP BY c.member_id ) AS stats ON ( m.id = @userID -- Filter OUTER QUERY to given member. AND stats.member_id = m.id ) ;
As you can see, we’re using the
(SELECT) AS stats construct to create a temporary result set on-the-fly that contains all of the aggregate information regarding the comments made by the target user. Derived tables have to be “constant” during the execution of the query, which is why we have to filter by
@userID in both the inner, derived query as well as in the outer query.
ASIDE: The whole point of the
LATERALderived table is so that the derived table query doesn’t have to be “constant”; and, can refer to the relevant row within the outer query.
When the derived table is defined inline like this, it makes the SQL query significantly harder to read (when compared to queries that don’t have derived tables). As of MySQL 8, we can greatly increase the readability by moving the derived table up and into a
WITH block has a slightly different syntax, defining the result set alias before the derived table query; but, I think this makes it easier to build-up a mental model of what the query is doing when you read it in a top-down manner:
SET @userID = 1; -- The WITH block allows us to create named, temporary result sets that we can -- then use later on within the same query. In this case, we're going to move -- the derived table out of the INNER JOIN block and into the WITH block. This -- does not change the derived table from a functional standpoint; but, it does -- make the code a little bit cleaner (through a slightly better separation of -- concerns). WITH stats AS ( -- Generate a DERIVED TABLE with multiple aggregations relating to the -- commenting for the given user. This allows us to run the related -- query "once for the member" instead of "once for each aggregate". SELECT c.member_id, COUNT( * ) AS commentCount, MIN( c.date_created ) AS minCreatedAt, MAX( c.date_created ) AS maxCreatedAt FROM blog_comment c WHERE c.member_id = @userID -- Filter down to specific user. GROUP BY c.member_id ) SELECT m.id, m.name, -- Gather the aggregate data from the derived stats table. s.commentCount, s.minCreatedAt, s.maxCreatedAt FROM member m INNER JOIN stats s ON ( m.id = @userID -- Filter down to specific user. AND s.member_id = m.id ) ;
First off, when we move the derived table up into the
WITH block, it definitely clarifies why a derived table has to be constant during the execution of the query. In this case, it is executing before the main query; so, there’s certainly no “outer query” for the derived table to refer to.
But, more than anything, doesn’t this just read much more clearly? It creates a clean separation of concerns, moving the derived table responsibilities to the top, leaving us with a simple
INNER JOIN in the main SQL query.
WITH block allows for comma-delimited statements. Meaning, you can define more than one common table expression (CTE) per query. And, one CTE can refer to another CTE that was defined higher-up in the same query. Though, I’m almost certain that including multiple, derived tables in a single query is going to make the mental model a bit overwhelming.
I’m pretty excited about this feature in MySQL 8! This is a relatively simple feature; but, it’s going to deliver a lot of value in terms of readability and maintainability, especially for User Interface (UI) related queries which often have to aggregate information in unique ways.
According to the MySQL documentation, common table expressions can be recursive, allowing for hierarchical data to be traversed directly within a SQL statement. That’s fun and all; but, I’m going to take a hard pass on using recursion in my SQL queries.
Recursion is already challenging enough for many developers when they are working in “the code”. I can only imagine that adding recursion to “the queries” will increase the level of difficulty. And, make the application harder to maintain over time.
I assume that this kind of functionality is awesome for Data Scientists and other Business Intelligence (BI) analysts. But, as an “application developer”, I’d much rather keep my recursion in ColdFusion where I can easily debug it; and, possibly even get rid of recursion altogether.
Your mileage may vary.