So you want to run a PIVOT query in MySQL
|Here’s a stored procedure which generates PIVOT
(or crosstab) queries in MySQL, sort of like what you used to be able to do in Microsoft Access, before everything moved to The Cloud.
prcPivot
It’s called prcPivot, because that has the word PIVOT in it, and it’s a stored procedure .
Arguments
prcPivot has a single JSON_OBJECT argument, because I think that’s going to be a bit easier to use than multiple NULLable parameters, and it makes it look a bit more like an actual TRANSFORM statement.
The JSON_OBJECT provides the following key/values:
Key | Description |
---|---|
TRANSFORM | The cells inside the table. This must be an aggregate function (e.g. SUM, COUNT, MAX etc).
The aggregate is applied to both the row and column criteria. It’s probably easiest to look at the examples than explain that here. |
SELECT | The columns on the left hand side of the table. Is a comma-separated list of expressions.
Don’t include any ‘AS’ aliases on your columns, because that won’t work. If you do need to alias columns, you can do that in the ‘FROM’ option; see the examples below. |
FROM | The source for the query |
WHERE | (Optional) a WHERE clause. You could put this in the FROM if you want. |
ORDER BY | (Optional) an ORDER BY clause. If omitted, will order by the columns in the SELECT option. |
PIVOT | An expression to generate the column headers |
PIVOT IN | (Optional) A comma separated list of column headers to restrict to
Because the headers are driven off the data, you could also add WHERE clauses to restrict which columns are included. |
PIVOT ORDER BY | (Optional) A way to order the column headers |
FORMAT | (Optional) Either the string “RESULT” (the default) or “SQL” ( to show the SQL used to generate the result ) |
Examples
To give a few random examples, here’s some CALL
statements based on the sakila sample database, which depicts the internal systems of a video rental store, which is a type of store that used to exist.
So this one shows the number of rentals per month for each customer (pivotting on the month):
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'C.email, R.customer_id', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")' ));
You can use the FORMAT
parameter to show the SQL used to generate that:
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'C.email, R.customer_id', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")', 'FORMAT', 'SQL' ));
... which produces:
SELECT C.email, R.customer_id, COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-05", R.rental_id , NULL)) AS "2005-05", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-06", R.rental_id , NULL)) AS "2005-06", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-07", R.rental_id , NULL)) AS "2005-07", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-08", R.rental_id , NULL)) AS "2005-08", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2006-02", R.rental_id , NULL)) AS "2006-02" FROM rental R INNER JOIN customer C ON R.customer_id = C.customer_id WHERE C.email < "ALF" GROUP BY C.email, R.customer_id ORDER BY C.email, R.customer_id
which can be useful if the generated SQL has a syntax error, fails at runtime for some reason, you want to modify it manually, or feed it into the explainerator.
More examples
Same sort of thing, pivotted on the other axis, by swapping the SELECT
and PIVOT
arguments.
Note we can only have a single expression in the PIVOT
so I've removed the customer_id
.
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'DATE_FORMAT(R.rental_date, "%Y-%m")', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'C.email' ));
Which looks a bit ugly because that DATE_FORMAT
is appearing in the column header, so you could move that into the FROM
instead and give it an alias; e.g.
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( T.rental_id ) ', 'SELECT', 'rental_month', 'FROM', '( SELECT R.rental_id, C.email, DATE_FORMAT(R.rental_date, "%Y-%m") AS rental_month FROM rental R INNER JOIN customer C ON R.customer_id = C.customer_id) AS T', 'WHERE', 'T.email < "ALF"', 'PIVOT', 'T.email' ));
Prior art
The guts of this was cribbed off this post, but I've added a few things that were missing:
- user-defined aggregate function (not just 'SUM')
- arguably slightly nicer syntax
- better error handling
- better handling of pivot columns containing quotes and newlines
- ordering of the pivot columns ( PIVOT ORDER BY )
- subsetting the pivot columns ( PIVOT IN )
- ability to return the SQL instead of the result set ( FORMAT )
and removed the rollup columns and rows, which I wasn't a huge fan of.
Source me
Here you go:
DELIMITER $$ DROP PROCEDURE IF EXISTS prcPivot $$ CREATE PROCEDURE prcPivot( IN jsonArguments JSON ) DETERMINISTIC SQL SECURITY INVOKER BEGIN -- The usual claptrap /* BSD 2-Clause License Copyright (c) 2023, randomnoun All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -- inputs are supplied as entries in the jsonArguments object: DECLARE strTransform VARCHAR(10000); -- TRANSFORM: aggregate values returned in table rows DECLARE strSelectColsCsv VARCHAR(10000); -- SELECT: column(s) on the left, separated by commas DECLARE strFrom VARCHAR(10000); -- FROM: table name, db.tbl or query DECLARE strWhere VARCHAR(10000); -- WHERE: empty string or where clause DECLARE strOrderBy VARCHAR(10000); -- ORDER BY: empty string or order by clause; defaults to strSelectColsCsv DECLARE strPivotCol VARCHAR(10000); -- PIVOT: name of column to put across the top DECLARE strPivotIn VARCHAR(10000); -- PIVOT IN: subset of columns DECLARE strPivotOrderBy VARCHAR(10000); -- PIVOT ORDER BY: order of columns (defaults to strPivotCol ) DECLARE strFormat VARCHAR(100); -- FORMAT: either 'SQL' or 'RESULT' ( default = RESULT ) -- TODO "PIVOT IN", "PIVOT ORDER BY" -- temp variables DECLARE strTransformRegex VARCHAR(100); DECLARE strTransformFunc VARCHAR(10000); DECLARE strTransformDistinct VARCHAR(10000); DECLARE strTransformVal VARCHAR(10000); DECLARE strErrorMessage VARCHAR(128); DECLARE strSqlSubQuery VARCHAR(10000); DECLARE strSqlQuoteVal VARCHAR(10000); DECLARE strSqlConcat VARCHAR(10000); DECLARE strSqlSelect1 MEDIUMTEXT; DECLARE strSqlSelect2 MEDIUMTEXT; -- bit more breathing room constructing the SQL SET SESSION group_concat_max_len = 100000; SET strTransform = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.TRANSFORM')); SET strSelectColsCsv = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.SELECT')); SET strFrom = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FROM')); SET strWhere = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.WHERE')), ''); SET strOrderBy = COALESCE( JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."ORDER BY"')), JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.ORDERBY')), CONCAT(strSelectColsCsv)); SET strPivotCol = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOT')); SET strPivotIn = COALESCE( JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT IN"')), -- JSON array ? maybe not JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTIN')), ''); SET strPivotOrderBy = COALESCE( JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT ORDER BY"')), JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTORDERBY')), CONCAT(strPivotCol)); SET strFormat = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FORMAT')), 'RESULT'); -- for debugging, uncomment the SELECTs of various @variables below -- check mandatory parameters exist IF strTransform IS NULL THEN -- etc SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing TRANSFORM'; -- has error code 1644, not 45000 END IF; IF strSelectColsCsv IS NULL THEN -- etc SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing SELECT'; END IF; IF strFrom IS NULL THEN -- etc SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing FROM'; END IF; IF strPivotCol IS NULL THEN -- etc SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing PIVOT'; END IF; -- validate parameters IF NOT (strTransform REGEXP '^\\s*([a-zA-Z]+)\\s*\\((.*)\\)\\s*$') THEN -- etc SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'TRANSFORM must be in an aggregate function; e.g. SUM(1)'; END IF; -- SUM(1) -> SUM, null, 1 -- COUNT(DISTINCT n) -> COUNT, DISTINCT, n SET strTransformRegex = '^\\s*([a-zA-Z]+)\\s*\\((\\s*DISTINCT\\s*)?(.*)\\)\\s*$'; SET strTransformFunc = REGEXP_REPLACE(strTransform, strTransformRegex , '$1'); SET strTransformDistinct = IF (REGEXP_REPLACE(strTransform, strTransformRegex, '$2') = "", "", "DISTINCT "); SET strTransformVal = REGEXP_REPLACE(strTransform, strTransformRegex, '$3'); -- select @strTransformFunc, @strTransformDistinct, @strTransformVal; -- all the aggregates from https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html IF NOT (strTransformFunc REGEXP '^(AVG|BIT_AND|BIT_OR|BIT_XOR|COUNT|' 'GROUP_CONCAT|JSON_ARRAYAGG|JSON_OBJECTAGG|' 'MAX|MIN|STD|STDDEV|STDDEV_POP|STDDEV_SAMP|' 'SUM|VAR_POP|VAR_SAMP|VARIANCE)$') THEN -- MESSAGE_TEXT cannot exceed 128 chars -- see https://dev.mysql.com/doc/refman/8.0/en/signal.html#signal-condition-information-items SET strErrorMessage = CONCAT('Unknown TRANSFORM aggregate "', strTransformFunc, '"; expected aggregate e.g. SUM, MIN, MAX, COUNT, GROUP_CONCAT'); SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage; END IF; IF NOT (strFormat REGEXP '^(SQL|RESULT)$') THEN SET strErrorMessage = CONCAT('Unknown FORMAT "', strFormat, '"; expected SQL or RESULT'); SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage; END IF; -- Find the column headings SET strSqlSubQuery = CONCAT( 'SELECT DISTINCT ', strPivotCol, ' AS val ', ' FROM ', strFrom, ' ', IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)), ' ORDER BY ', strPivotOrderBy); -- select @subq; -- escape quotes, backslashes inside val SET strSqlQuoteVal = "'\"', REGEXP_REPLACE(val, '([\"\\\\\\\\])', '\\\\\\\\$1'), '\"'"; SET strSqlConcat = CONCAT ( "CONCAT('", strTransformFunc, "(", strTransformDistinct, "IF(", strPivotCol, " = ', ", strSqlQuoteVal, ", ', ", strTransformVal, ", NULL)) AS ', ", strSqlQuoteVal, ")"); -- select strSqlQuoteVal, strSqlConcat; SET @strSqlSelect1 = CONCAT( 'SELECT GROUP_CONCAT(', strSqlConcat, ' SEPARATOR ",\n") INTO @strPivotSelectColsCsv', ' FROM ( ',strSqlSubQuery, ' ) AS top', IF(strPivotIn = '', '', CONCAT('\n WHERE val IN (', strPivotIn, ')')) ); -- select @strSqlSelect1; PREPARE _sql FROM @strSqlSelect1; -- seems to require a session var here EXECUTE _sql; -- intermediate step: build SQL for columns DEALLOCATE PREPARE _sql; -- construct the query and perform it SET @strSqlSelect2 = CONCAT( 'SELECT ', strSelectColsCsv, ',\n', @strPivotSelectColsCsv, '\n FROM ', strFrom, ' ', IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)), '\n GROUP BY ', strSelectColsCsv, -- @TODO remove any 'AS' aliases IF(strOrderBy = '', '', CONCAT('\n ORDER BY ', strOrderBy)), '\n' ); IF strFormat = 'SQL' THEN SELECT @strSqlSelect2 AS `sql`; -- the statement that generates the result ELSE PREPARE _sql FROM @strSqlSelect2; EXECUTE _sql; -- the resulting pivot table ouput DEALLOCATE PREPARE _sql; END IF; END; $$