Alternative to using DISTINCT or FOR XML PATH

Question

Here is the set up, question is below.

CREATE TABLE Person (
   PersonId int Identity(1,1) not null,
   PersonName varchar(20) not null
PRIMARY KEY CLUSTERED (PersonId)
)
GO

CREATE TABLE Size (
   SizeId int IDENTITY (1,1) not null,
   Size varchar(20) null,
   PersonId int NOT NULL
PRIMARY KEY CLUSTERED (SizeId)
)

ALTER TABLE dbo.Size
ADD CONSTRAINT FK_Size_Person FOREIGN KEY (PersonId) REFERENCES dbo.Person (PersonId)
 GO

 INSERT INTO Person (PersonName)
VALUES ('Bob'), ('Bill'), ('Dave'), ('Nathan'), ('Luke')
GO
INSERT INTO Size (Size, PersonId)
VALUES ('32', 1), ('2XL', 1), ('Large', 1), ('Small', 2), ('Medium', 3)

CREATE FUNCTION [dbo].[fn_Sizes]
(    
    @PersonId int
)
RETURNS varchar(150)
AS
BEGIN
    DECLARE @sizes varchar(150)

    SET @sizes = (SELECT s.Size + ', '
            FROM dbo.Size s              
            WHERE s.PersonId = @PersonId
            FOR XML PATH(''))  

    -- Return the sizes
    RETURN ISNULL(LEFT(@sizes, LEN(@sizes)-1),'')
END

Fixing a view and one of the issues is a function that combines multiple rows of values into one column. I'm not sure if the problem is the DISTINCT keyword, or the function, or the use of FOR XML PATH, or all of it. I am optimizing a view that takes 15 mins to run, and the execution plan says 90% of the work is in the DISTINCT sort. So I am looking for a way to get rid of it.

The actual view query is really big, but an example of this section is:

SELECT DISTINCT p.PersonName, dbo.fn_Sizes(p.PersonId) as Size
FROM dbo.Person p
JOIN dbo.Size s ON p.PersonId = s.PersonId

The DISTINCT is in there to get rid of the extra Bob rows, because Bob has many sizes. I need to keep the multiple sizes on Bob's row, but I don't want multiple rows of Bob. Is there a more efficient way to do that other than using DISTINCT?

Second, word on the street is that functions are best avoided if you are going for speed. I don't see a way to use "FOR XML PATH" outside the function, because I can't get it to work on just one column. So I am looking for a way to produce the multiple values on one row.

take out `DISTINCT` and Add `GROUP BY p.PersonName ,dbo.fn_Sizes(p.PersonId)` — M.Ali, Jul 20 '15 at 18:49
See the accepted answer in the dupe. It shows how to do this without functions. BTW you don't need to join onto `dbo.Size` at all in the outer query anyway though. The function already has a select to get the required information from there. So absolutely pointless to select out multiple rows calculate the function multiple times and remove the dupes. — Martin Smith, Jul 20 '15 at 18:50
@MartinSmith Not joining it with `dbo.Size` will result in extra rows. Anyway there is no need for this function and it will hurt the performance, I think you should scarp the idea of using this function and use CTE , derived table or anything but not this scalar function — M.Ali, Jul 20 '15 at 18:55
@M.Ali - Well if not every person has a size this can be done by `EXISTS` without blowing up the number of rows. Or wrapping the whole thing in a CTE and excluding those where the function returns null. — Martin Smith, Jul 20 '15 at 18:58

Alternative to using DISTINCT or FOR XML PATH

0 Answers0