So you want to run a PIVOT query in MySQL
Here’s a stored procedure which generates PIVOT (or crosstab) queries in MySQL, sort of like what you used to be able to do in Microsoft Access, before everything moved to The Cloud.
prcPivot
It’s called prcPivot, because that has the word PIVOT in it, and it’s a stored procedure .
Arguments
prcPivot has a single JSON_OBJECT argument, because I think that’s going to be a bit easier to use than multiple NULLable parameters, and it makes it look a bit more like an actual TRANSFORM statement.
The JSON_OBJECT provides the following key/values:
| Key | Description |
|---|---|
| TRANSFORM | The cells inside the table. This must be an aggregate function (e.g. SUM, COUNT, MAX etc).
The aggregate is applied to both the row and column criteria. It’s probably easiest to look at the examples than explain that here. |
| SELECT | The columns on the left hand side of the table. Is a comma-separated list of expressions.
Don’t include any ‘AS’ aliases on your columns, because that won’t work. If you do need to alias columns, you can do that in the ‘FROM’ option; see the examples below. |
| FROM | The source for the query |
| WHERE | (Optional) a WHERE clause. You could put this in the FROM if you want. |
| ORDER BY | (Optional) an ORDER BY clause. If omitted, will order by the columns in the SELECT option. |
| PIVOT | An expression to generate the column headers |
| PIVOT IN | (Optional) A comma separated list of column headers to restrict to
Because the headers are driven off the data, you could also add WHERE clauses to restrict which columns are included. |
| PIVOT ORDER BY | (Optional) A way to order the column headers |
| FORMAT | (Optional) Either the string “RESULT” (the default) or “SQL” ( to show the SQL used to generate the result ) |
Examples
To give a few random examples, here’s some CALL statements based on the sakila sample database, which depicts the internal systems of a video rental store, which is a type of store that used to exist.
So this one shows the number of rentals per month for each customer (pivotting on the month):
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'C.email, R.customer_id', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")' ));
You can use the FORMAT parameter to show the SQL used to generate that:
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'C.email, R.customer_id', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")', 'FORMAT', 'SQL' ));
... which produces:
SELECT C.email, R.customer_id, COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-05", R.rental_id , NULL)) AS "2005-05", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-06", R.rental_id , NULL)) AS "2005-06", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-07", R.rental_id , NULL)) AS "2005-07", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-08", R.rental_id , NULL)) AS "2005-08", COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2006-02", R.rental_id , NULL)) AS "2006-02" FROM rental R INNER JOIN customer C ON R.customer_id = C.customer_id WHERE C.email < "ALF" GROUP BY C.email, R.customer_id ORDER BY C.email, R.customer_id
which can be useful if the generated SQL has a syntax error, fails at runtime for some reason, you want to modify it manually, or feed it into the explainerator.
More examples
Same sort of thing, pivotted on the other axis, by swapping the SELECT and PIVOT arguments.
Note we can only have a single expression in the PIVOT so I've removed the customer_id.
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( R.rental_id ) ', 'SELECT', 'DATE_FORMAT(R.rental_date, "%Y-%m")', 'FROM', 'rental R INNER JOIN customer C ON R.customer_id = C.customer_id', 'WHERE', 'C.email < "ALF"', 'PIVOT', 'C.email' ));
Which looks a bit ugly because that DATE_FORMAT is appearing in the column header, so you could move that into the FROM instead and give it an alias; e.g.
CALL prcPivot(JSON_OBJECT( 'TRANSFORM', 'COUNT ( T.rental_id ) ', 'SELECT', 'rental_month', 'FROM', '( SELECT R.rental_id, C.email, DATE_FORMAT(R.rental_date, "%Y-%m") AS rental_month FROM rental R INNER JOIN customer C ON R.customer_id = C.customer_id) AS T', 'WHERE', 'T.email < "ALF"', 'PIVOT', 'T.email' ));
Prior art
The guts of this was cribbed off this post, but I've added a few things that were missing:
- user-defined aggregate function (not just 'SUM')
- arguably slightly nicer syntax
- better error handling
- better handling of pivot columns containing quotes and newlines
- ordering of the pivot columns ( PIVOT ORDER BY )
- subsetting the pivot columns ( PIVOT IN )
- ability to return the SQL instead of the result set ( FORMAT )
and removed the rollup columns and rows, which I wasn't a huge fan of.
Source me
Here you go:
DELIMITER $$
DROP PROCEDURE IF EXISTS prcPivot $$
CREATE PROCEDURE prcPivot(
IN jsonArguments JSON
)
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
-- The usual claptrap
/* BSD 2-Clause License
Copyright (c) 2023, randomnoun
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */
-- inputs are supplied as entries in the jsonArguments object:
DECLARE strTransform VARCHAR(10000); -- TRANSFORM: aggregate values returned in table rows
DECLARE strSelectColsCsv VARCHAR(10000); -- SELECT: column(s) on the left, separated by commas
DECLARE strFrom VARCHAR(10000); -- FROM: table name, db.tbl or query
DECLARE strWhere VARCHAR(10000); -- WHERE: empty string or where clause
DECLARE strOrderBy VARCHAR(10000); -- ORDER BY: empty string or order by clause; defaults to strSelectColsCsv
DECLARE strPivotCol VARCHAR(10000); -- PIVOT: name of column to put across the top
DECLARE strPivotIn VARCHAR(10000); -- PIVOT IN: subset of columns
DECLARE strPivotOrderBy VARCHAR(10000); -- PIVOT ORDER BY: order of columns (defaults to strPivotCol )
DECLARE strFormat VARCHAR(100); -- FORMAT: either 'SQL' or 'RESULT' ( default = RESULT )
-- TODO "PIVOT IN", "PIVOT ORDER BY"
-- temp variables
DECLARE strTransformRegex VARCHAR(100);
DECLARE strTransformFunc VARCHAR(10000);
DECLARE strTransformDistinct VARCHAR(10000);
DECLARE strTransformVal VARCHAR(10000);
DECLARE strErrorMessage VARCHAR(128);
DECLARE strSqlSubQuery VARCHAR(10000);
DECLARE strSqlQuoteVal VARCHAR(10000);
DECLARE strSqlConcat VARCHAR(10000);
DECLARE strSqlSelect1 MEDIUMTEXT;
DECLARE strSqlSelect2 MEDIUMTEXT;
-- bit more breathing room constructing the SQL
SET SESSION group_concat_max_len = 100000;
SET strTransform = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.TRANSFORM'));
SET strSelectColsCsv = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.SELECT'));
SET strFrom = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FROM'));
SET strWhere = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.WHERE')), '');
SET strOrderBy = COALESCE(
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."ORDER BY"')),
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.ORDERBY')),
CONCAT(strSelectColsCsv));
SET strPivotCol = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOT'));
SET strPivotIn = COALESCE(
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT IN"')), -- JSON array ? maybe not
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTIN')),
'');
SET strPivotOrderBy = COALESCE(
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT ORDER BY"')),
JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTORDERBY')),
CONCAT(strPivotCol));
SET strFormat = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FORMAT')), 'RESULT');
-- for debugging, uncomment the SELECTs of various @variables below
-- check mandatory parameters exist
IF strTransform IS NULL THEN -- etc
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing TRANSFORM'; -- has error code 1644, not 45000
END IF;
IF strSelectColsCsv IS NULL THEN -- etc
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing SELECT';
END IF;
IF strFrom IS NULL THEN -- etc
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing FROM';
END IF;
IF strPivotCol IS NULL THEN -- etc
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing PIVOT';
END IF;
-- validate parameters
IF NOT (strTransform REGEXP '^\\s*([a-zA-Z]+)\\s*\\((.*)\\)\\s*$') THEN -- etc
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'TRANSFORM must be in an aggregate function; e.g. SUM(1)';
END IF;
-- SUM(1) -> SUM, null, 1
-- COUNT(DISTINCT n) -> COUNT, DISTINCT, n
SET strTransformRegex = '^\\s*([a-zA-Z]+)\\s*\\((\\s*DISTINCT\\s*)?(.*)\\)\\s*$';
SET strTransformFunc = REGEXP_REPLACE(strTransform, strTransformRegex , '$1');
SET strTransformDistinct = IF (REGEXP_REPLACE(strTransform, strTransformRegex, '$2') = "", "", "DISTINCT ");
SET strTransformVal = REGEXP_REPLACE(strTransform, strTransformRegex, '$3');
-- select @strTransformFunc, @strTransformDistinct, @strTransformVal;
-- all the aggregates from https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html
IF NOT (strTransformFunc REGEXP '^(AVG|BIT_AND|BIT_OR|BIT_XOR|COUNT|'
'GROUP_CONCAT|JSON_ARRAYAGG|JSON_OBJECTAGG|'
'MAX|MIN|STD|STDDEV|STDDEV_POP|STDDEV_SAMP|'
'SUM|VAR_POP|VAR_SAMP|VARIANCE)$') THEN
-- MESSAGE_TEXT cannot exceed 128 chars
-- see https://dev.mysql.com/doc/refman/8.0/en/signal.html#signal-condition-information-items
SET strErrorMessage = CONCAT('Unknown TRANSFORM aggregate "', strTransformFunc,
'"; expected aggregate e.g. SUM, MIN, MAX, COUNT, GROUP_CONCAT');
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage;
END IF;
IF NOT (strFormat REGEXP '^(SQL|RESULT)$') THEN
SET strErrorMessage = CONCAT('Unknown FORMAT "', strFormat, '"; expected SQL or RESULT');
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage;
END IF;
-- Find the column headings
SET strSqlSubQuery = CONCAT(
'SELECT DISTINCT ', strPivotCol, ' AS val ',
' FROM ', strFrom, ' ',
IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)),
' ORDER BY ', strPivotOrderBy);
-- select @subq;
-- escape quotes, backslashes inside val
SET strSqlQuoteVal = "'\"', REGEXP_REPLACE(val, '([\"\\\\\\\\])', '\\\\\\\\$1'), '\"'";
SET strSqlConcat = CONCAT (
"CONCAT('",
strTransformFunc, "(", strTransformDistinct,
"IF(", strPivotCol, " = ', ", strSqlQuoteVal, ", ', ", strTransformVal, ", NULL)) AS ', ", strSqlQuoteVal,
")");
-- select strSqlQuoteVal, strSqlConcat;
SET @strSqlSelect1 = CONCAT(
'SELECT GROUP_CONCAT(', strSqlConcat, ' SEPARATOR ",\n") INTO @strPivotSelectColsCsv',
' FROM ( ',strSqlSubQuery, ' ) AS top',
IF(strPivotIn = '', '', CONCAT('\n WHERE val IN (', strPivotIn, ')'))
);
-- select @strSqlSelect1;
PREPARE _sql FROM @strSqlSelect1; -- seems to require a session var here
EXECUTE _sql; -- intermediate step: build SQL for columns
DEALLOCATE PREPARE _sql;
-- construct the query and perform it
SET @strSqlSelect2 = CONCAT(
'SELECT ',
strSelectColsCsv, ',\n',
@strPivotSelectColsCsv,
'\n FROM ', strFrom, ' ',
IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)),
'\n GROUP BY ', strSelectColsCsv, -- @TODO remove any 'AS' aliases
IF(strOrderBy = '', '', CONCAT('\n ORDER BY ', strOrderBy)),
'\n'
);
IF strFormat = 'SQL' THEN
SELECT @strSqlSelect2 AS `sql`; -- the statement that generates the result
ELSE
PREPARE _sql FROM @strSqlSelect2;
EXECUTE _sql; -- the resulting pivot table ouput
DEALLOCATE PREPARE _sql;
END IF;
END;
$$
