a0002: rowset string concatenation - UDF vs. temp table - Which method is best?

Sunday, January 13, 2008

rowset string concatenation - UDF vs. temp table - Which method is best? | TSQL

Originally posted: Rowset string concatenation: Which method is best?

Originally posted here.

We'll use the Authors table in Pubs. I want a comma-delimited list, per state, of the last name of each author who lives there.
First, the scalar UDF:

USE pubs
GO

CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))
RETURNS VARCHAR(8000)
AS
BEGIN
 DECLARE @Output VARCHAR(8000)
 SET @Output = ''

 SELECT @Output = CASE @Output
    WHEN '' THEN au_lname
    ELSE @Output + ', ' + au_lname
    END
 FROM Authors
 WHERE State = @State
 ORDER BY au_lname

 RETURN @Output
END
GO

To find the list I want:

SELECT DISTINCT State, dbo.ConcatAuthors(State)
FROM Authors
ORDER BY State

... And the adaptation of Rob's temp table method... I did change
two things due to problems I discovered during testing. One, I've
altered the au_lname column to VARCHAR(8000); the column in the Authors
table is VARCHAR(40), not large enough for all of the California
authors. What if we were dealing with a much larger dataset? Second, I
added an IDENTITY column, and I'm clustering on that instead of the
actual data to get the ordering. I'm doing so because of the
VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had
enough data to exceed that length, this method would fail.

CREATE TABLE #AuthorConcat
(
 State CHAR(2) NOT NULL,
 au_lname VARCHAR(8000) NOT NULL,
 Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY
)

INSERT #AuthorConcat
(
 State,
 au_lname
)
SELECT
 State,
 au_lname
FROM Authors
ORDER BY
 State,
 au_lname

DECLARE @Authors VARCHAR(8000)
SET @Authors = ''
DECLARE @State CHAR(2)
SET @State = ''

UPDATE #AuthorConcat
SET @Authors = au_lname = CASE
    WHEN @State = State THEN @Authors + ', ' + au_lname
    ELSE au_lname END,
 @State = State

SELECT State, MAX(au_lname)
FROM #AuthorConcat
GROUP BY State

Clever, but more complex and harder to read than the scalar UDF
version. Output is identical, but that's not why we're here. Which one
is more efficient?

Drumroll, please...

Results were tabulated using STATISTICS IO, STATISTICS TIME, and
Query Analyzer's Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE were run before each test.

Scalar UDF Method
Total cost: 0.0492
Total Scan count: 1
Total Logical reads: 2
Total Physical reads: 2
Total time: 25 ms

Temp Table Method
Total cost: 0.2131
Total Scan count: 4
Total Logical reads: 9
Total Physical reads: 2
Total time: 88 ms

So in conclusion, neither method is incredibly taxing with the tiny Pubs dataset, but I think I have proven that the UDF is far more efficient.

a0002

Sunday, January 13, 2008

rowset string concatenation - UDF vs. temp table - Which method is best? | TSQL

No comments:

x443