Entity Framework: Never call GroupBy().ToDictionary()

Entity Framework generally works pretty well, but every once in a while it’ll throw something strange at you. Here’s one I experienced a while ago that caused an unexpected performance hit.

ToDictionary¬†And GroupBy Don’t Mix

Let’s say I have a table containing a bunch of products, and I want to find out how many products of each size there are. This is trivial in SQL – it’s a group by, selecting the count and the name. Now let’s say I want to stick the results into a Dictionary<string, int> for further processing. The code might look something like this:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                         .ToDictionary(p => p.Key, p => p.Count());

Not too bad, right? Here’s the generated SQL:

SELECT 
 [Project2].[C1] AS [C1], 
 [Project2].[Size] AS [Size], 
 [Project2].[C2] AS [C2], 
 [Project2].[Id] AS [Id], 
 [Project2].[Name] AS [Name], 
 [Project2].[Size1] AS [Size1]
 FROM ( SELECT 
 [Distinct1].[Size] AS [Size], 
 1 AS [C1], 
 [Extent2].[Id] AS [Id], 
 [Extent2].[Name] AS [Name], 
 [Extent2].[Size] AS [Size1], 
 CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
 FROM (SELECT DISTINCT 
 [Extent1].[Size] AS [Size]
 FROM [dbo].[Products] AS [Extent1] ) AS [Distinct1]
 LEFT OUTER JOIN [dbo].[Products] AS [Extent2] ON ([Distinct1].[Size] = [Extent2].[Size]) OR (([Distinct1].[Size] IS NULL) AND ([Extent2].[Size] IS NULL))
 ) AS [Project2]
 ORDER BY [Project2].[Size] ASC, [Project2].[C2] ASC;

Huh? What the heck is that? Where’s our GROUP BY? Where’s the COUNT? Calling ToDictionary definitely did NOT do what we expected it to.

Let’s try again. This time we’ll project the values we want onto an anonymous type, and then call ToDictionary on that, instead of on our grouping:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                     .Select(g => new { g.Key, Count = g.Count()})
                     .ToDictionary(p => p.Key, p => p.Count);

That gives us the following SQL:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [Size], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
        [Extent1].[Size] AS [K1], 
        COUNT(1) AS [A1]
        FROM [dbo].[Products] AS [Extent1]
        GROUP BY [Extent1].[Size]
    )  AS [GroupBy1];

Now that’s more like it!

But why?

ToDictionary() is an extension method on the Enumerable class in System.Linq. It doesn’t know anything about Entity Framework – all it can do is enumerate an IEnumerable. So when you call ToDictionary on your IQueryable<IGrouping<,>> that the GroupBy method returned, it simply materializes the query.

But why does Entity Framework generate such a strange query in the first place?

If you call GroupBy(…) and then enumerate the GroupBy(), Entity Framework has no way of knowing¬†what you are going to do with the results. For all it knows, you are going to end up extracting the entities themselves from the GroupBy.

// I'm so sneaky... I'm gonna trick EF...or am i?
var productSizeCounts = db.Products.GroupBy(p => p.Size).ToList();
var firstProductWithEachSize = productSizeCounts.
                                  Select(p => p.FirstOrDefault());

If it had tried to be smart, and turned the query into the GROUP BY you were expecting, it wouldn’t have the necessary information available to materialize the entities you are asking for!

Explicitly selecting the fields you want lets EF generate the GROUP BY query you would expect, because it knows that’s all the information you’ll need.