Blog Archives

Concatenating multiple columns across rows


Today I ran into an interesting question on the forums at SQLServerCentral and I decided to share the solution I provided, because it was fun to code and, hopefully, useful for some of you.

Many experienced T-SQL coders make use of FOR XML PATH(‘’) to build concatenated strings from multiple rows. It’s a nice technique and pretty simple to use.
For instance, if you want to create a list of databases in a single concatenated string, you can run this statement:

SELECT CAST((
    SELECT name + ',' AS [text()]
    FROM sys.databases
    ORDER BY name
    FOR XML PATH('')
) AS varchar(max))

The SELECT statement produces this result:

allDBs
------------------------------------------------------------------
BROKEN,LightHouse,master,model,msdb,tempdb,TEST,test80,TOOLS,WORK,

Great! But, what if you had to concatenate multiple columns at the same time? It’s an unusual requirement, but not an impossible one.
Let’s consider this example:


-- =================================
-- Create a sentences table
-- =================================
DECLARE @Sentences TABLE (
    sentence_id int PRIMARY KEY CLUSTERED,
    sentence_description varchar(50)
)

-- =================================
-- Sentences are broken into rows
-- =================================
DECLARE @Rows TABLE (
    sentence_id int,
    row_id      int,
    Latin       varchar(500),
    English     varchar(500),
    Italian     varchar(500)
)

-- =================================
-- Create three sentences
-- =================================
INSERT INTO @Sentences VALUES(1,'First sentence.')
INSERT INTO @Sentences VALUES(2,'Second Sentence')
INSERT INTO @Sentences VALUES(3,'Third sentence')

-- =================================
-- Create sentences rows from 
-- "De Finibus bonorum et malorum" 
-- by Cicero, AKA "Lorem Ipsum"
-- =================================
INSERT INTO @Rows VALUES(1, 1, 
    'Neque porro quisquam est,',
    'Nor again is there anyone who',
    'Viceversa non vi è nessuno che ama,')
INSERT INTO @Rows VALUES(1, 2, 
    'qui dolorem ipsum quia dolor sit amet,',
    'loves or pursues or desires to obtain pain',
    'insegue, vuol raggiungere il dolore in sé')
INSERT INTO @Rows VALUES(1, 3, 
    'consectetur, adipisci velit, sed quia non numquam',
    'of itself, because it is pain, but because occasionally',
    'perché è dolore ma perché talvolta')
INSERT INTO @Rows VALUES(1, 3, 
    'eius modi tempora incidunt',
    'circumstances occur in which',
    'capitano circostanze tali per cui')
INSERT INTO @Rows VALUES(1, 3, 
    'ut labore et dolore magnam aliquam quaerat voluptatem.',
    'toil and pain can procure him some great pleasure.',
    'con il travaglio e il dolore si cerca qualche grande piacere.') 
INSERT INTO @Rows VALUES(2, 1, 
    'Ut enim ad minima veniam,',
    'To take a trivial example,',
    'Per venire a casi di minima importanza,')
INSERT INTO @Rows VALUES(2, 2, 
    'quis nostrum exercitationem ullam corporis suscipit laboriosam,',
    'which of us ever undertakes laborious physical exercise,',
    'chi di noi intraprende un esercizio fisico faticoso')
INSERT INTO @Rows VALUES(2, 3, 
    'nisi ut aliquid ex ea commodi consequatur?',
    'except to obtain some advantage from it?',
    'se non per ottenere da esso qualche vantaggio?') 
INSERT INTO @Rows VALUES(3, 1, 
    'Quis autem vel eum iure reprehenderit qui in ea voluptate',
    'But who has any right to find fault with a man who chooses to enjoy a pleasure',
    'O chi può biasimare colui che decide di provare un piacere')
INSERT INTO @Rows VALUES(3, 2, 
    'velit esse quam nihil molestiae consequatur,',
    'that has no annoying consequences,',
    'che non porta conseguenze negative,')
INSERT INTO @Rows VALUES(3, 3, 
    'vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?',
    'or one who avoids a pain that produces no resultant pleasure?',
    'o che fugge quel dolore che non produce nessun piacere?')

The setup code creates two tables: Sentences and Rows. The first one is the master table, that contains the sentence_id and a description. The second one contains the actual sentences, broken into rows and organized with languages in columns.

For the purposes of this test, I inserted in the Rows table an excerpt of Cicero’s “De Finibus bonorum et malorum”, also known as “Lorem Ipsum”, the printing and typesetting industry’s standard dummy text since the 1500s.

Here’s how the input data looks like:

What we want to do is concatenate all the rows for each sentence, keeping the languages separated. It could be accomplished very easily concatenating each column separately in a subquery, but what if the input data comes from a rather expensive query? You don’t want to run the statement for each language, do you?

Let’s see how this can be done in a single scan:

SELECT sentence_id, sentence_description, Latin, English, Italian
FROM (
    SELECT Sentences.sentence_id, sentence_description, language_name, string 
    FROM   @Sentences AS Sentences
    OUTER APPLY (
        SELECT *
        FROM (
			-- =================================
			-- Create a Languages inline query
			-- =================================
                      SELECT 'Latin'
            UNION ALL SELECT 'English'
            UNION ALL SELECT 'Italian'
        ) Languages (language_name)
        CROSS APPLY (
			-- =================================
			-- Concatenate all the rows for 
			-- the current sentence and language
			-- from an UNPIVOTed version of the
			-- original rows table
			-- =================================
            SELECT sentence_id, string = (
                SELECT string + ' ' AS [data()] 
                FROM @Rows AS src
                UNPIVOT ( string FOR language_name IN (Latin, English, Italian) ) AS u
                WHERE sentence_id = Sentences.sentence_id
                    AND language_name = Languages.language_name
                ORDER BY row_id
                FOR XML PATH('')
            )
        ) AS ca
    ) AS oa
) AS src
-- =================================
-- Re-transform rows to columns
-- =================================
PIVOT ( MIN(string) FOR language_name IN ([Latin],[English],[Italian])) AS p

If you don’t like PIVOT and UNPIVOT, you can always use CASE expressions to create a crosstab.
Here’s the final result:

With a little of PIVOT, UNPIVOT and FOR XML you can achieve really surprising results, you just need to unleash your creativity.

Oracle BUG: UNPIVOT returns wrong data for non-unpivoted columns


Some bugs in Oracle’s code are really surprising. Whenever I run into this kind of issue, I can’t help but wonder how nobody else noticed it before.

Some days ago I was querying AWR data from DBA_HIST_SYSMETRIC_SUMMARY and I wanted to turn the columns AVERAGE, MAXVAL and MINVAL into rows, in order to fit this result set into a performance graphing application that expects input data formatted as {TimeStamp, SeriesName, Value}.

Columns to rows? A good match for UNPIVOT.

Oracle 11g introduced PIVOT and UNPIVOT operators to allow rows-to-columns and columns-to-rows transformations. Prior to 11g, this kind of transformation had to be coded with bulky CASE expressions (for PIVOT) or pesky UNION queries (for UNPIVOT). PIVOT and UNPIVOT allow developers to write more concise and readable statements, but I guess that not so many people have been using these features since their release, or they would have found this bug very soon.

Here is the statement I was trying to run:

WITH Metrics AS (
    SELECT to_date(to_char(BEGIN_TIME,'YYYY-MM-DD HH24'),'YYYY-MM-DD HH24') AS TS,
        AVG(AVERAGE) AS AVERAGE,
        MAX(MAXVAL) AS MAXVAL,
        MIN(MINVAL) AS MINVAL
    FROM DBA_HIST_SYSMETRIC_SUMMARY
    WHERE METRIC_NAME = 'Host CPU Utilization (%)'
    GROUP BY to_date(to_char(BEGIN_TIME,'YYYY-MM-DD HH24'),'YYYY-MM-DD HH24')
)
SELECT TS, aggregate, value
FROM Metrics
UNPIVOT (value FOR aggregate IN (AVERAGE, MAXVAL, MINVAL))
ORDER BY 1

The idea behind was to convert the date column into a string without the minute part, in order to convert it back to date and group by hour.

Surprisingly enough, this was the result:

Wrong results with UNPIVOT

The date column was returned with wrong data. Why?

The issue seems to be related to the date datatype, because converting back to date after the UNPIVOT works just fine:

WITH Metrics AS (
    SELECT to_char(BEGIN_TIME,'YYYY-MM-DD HH24') AS TS,
        AVG(AVERAGE) AS AVERAGE,
        MAX(MAXVAL) AS MAXVAL,
        MIN(MINVAL) AS MINVAL
    FROM DBA_HIST_SYSMETRIC_SUMMARY
    WHERE METRIC_NAME = 'Host CPU Utilization (%)'
    GROUP BY to_char(BEGIN_TIME,'YYYY-MM-DD HH24')
)
SELECT to_date(TS,'YYYY-MM-DD HH24') AS TS, aggregate, value
FROM Metrics
UNPIVOT (value FOR aggregate IN (AVERAGE, MAXVAL, MINVAL))
ORDER BY 1

This query, instead, produces the expected results.

Correct data with char column

I raised this issue with Oracle Support who filed it under bug ID 9900850.8. Both 11.2.0.1 and 11.2.0.2 seem to be be affected by this problem, but it’s quite unlikely to see it fixed before 12.1.

Time will tell.