The other day, when aggregating some SQL data for a Customer Success report, I went to look up the semantics for the GROUP_CONCAT() function in MySQL. I rarely use the GROUP_CONCAT() function since I’ve had several problems with it in the past, from failing silently to it bringing back incorrect data (which I believe was a bug in the version of MySQL I was using at the time). As such, I haven’t committed the function signature to memory. That said, when I got to the documentation, I noticed two other methods documented right next to it: JSON_ARRAYAGG() and JSON_OBJECTAGG(). I had never seen these before; but, oh chickens, these MySQL functions seem awesome for group aggregation. I’ll definitely be using these instead of GROUP_CONCAT() going forward in my MySQL 5.7.32 reporting.

A while back, I mentioned that we were upgrading to MySQL 5.7.32 at work. And, that I was excited to see that such an upgrade would introduce the JSON (JavaScript Object Notation) column-type and relevant manipulation functions. But somehow, when I was learning about the JSON support in MySQL, I missed the fact two new aggregation functions were added as well:

  • JSON_ARRAYAGG( column )
  • JSON_OBJECTAGG( key_column, value_column )

Just like MIN(), MAX(), and GROUP_CONCAT(), the JSON aggregation functions allow us to extract information from a set of rows that have been collapsed through row-grouping (ex, using GROUP BY). Only, they allow us to extract that information into higher-level data structures, not just delimited strings.

The JSON_ARRAYAGG() function gathers all the values in a given column and returns then in a single, aggregated JSON Array.

The JSON_OBJECTAGG() function allows us to gather data from two columns and returns them in a single, aggregated JSON Object. Within this JSON Object, one column defines the key and one column defines the value. Each row within the grouping generates a key-value pair to be added to the aggregate Object.

With SQL like this, the easiest way to understand it is to see an example. So, let’s consider the database schema for my blog. I have “blog entries” and I have “tags” (ex, “ColdFusion”, “SQL”, “JavaScript”). There exists a many-to-many relationship between blog entries and tags that get codified using a “join table”:

  • blog_entry
  • tag
  • blog_entry_tag_jn – imagine a two-column table that does nothing but hold the primary keys from the other two tables in the many-to-many relationship.

Given this high-level schema, let’s look at how I might query for blog entries and – for each blog entry record – pull back information about the tags associated with that entry. I’m going both the JSON_ARRAYAGG() and JSON_OBJECTAGG() functions in the same query so that we can compare the two results:


	-- Pull back all the tag names a single array. Each tag will be an item
	-- within the array.
	JSON_ARRAYAGG( ) AS tagNames,

	-- Create an object with the schema { : }. Each tag ID will
	-- be a key within the object.
	blog_entry e
	blog_entry_tag_jn jn -- Our many-to-many join table.
	jn.blog_entry_id =
	tag t
ON = jn.tag_id

-- Since we're GROUPING on the blog entry records, all of the `INNER JOIN` tag
-- information is going to be collapsed. However, we can extract aggregation
-- information about the tags using our JSON functions above!
	COUNT( * ) > 1 -- To make the grouping more exciting!

Since we’re using a GROUP BY on the blog entries, all of the many-to-many tag information is getting collapsed down into a grouping. However, we can then extract that grouping information on a per-entry-row basis by using the JSON aggregation functions. And, when we run this SQL code, we get the following results:

As you can see, the JSON_ARRAYAGG() function gathers each tag name and pushes it onto a single array. And, the JSON_OBJECTAGG() function gathers each tag id and name and appends them to a single object (using the id column as the key and the name column as the value). How sweet is this!

Now that we have JSON (JavaScript Object Notation) aggregation functions in MySQL 5.7 (added in 5.7.22), I can’t think of a reason that I’d want to use the GROUP_CONCAT() function anymore. It seems that having structured data is always better than having a delimited list. But, I did just stumble upon these functions, so maybe there are use-cases I haven’t considered yet.

Source link

Leave a reply

Please enter your comment!
Please enter your name here