Entity Framework Performance: How Your Seeks Can Turn Into Scans

Entity Framework is pretty handy, but as we’ve seen before, you can sometimes end up with unexpected performance hits. This one took me by surprise a while back.

Let’s start with a simple Producs table in our database.

CREATE TABLE [dbo].[Products] (
    [Id] [int] NOT NULL IDENTITY,
    [Name] [nvarchar](max),
    [Size] [nvarchar](max),
    [Type] [int] NOT NULL,
    CONSTRAINT [PK_dbo.Products] PRIMARY KEY ([Id])
)

Suppose we have a method to return the number of products:

 public int GetProductCount()
 {
     var count = db.Products.Count();
     return count;
 }

No problem, right? Now, let’s say we want to be able to optionally filter the query to only include products of a certain type. We want this to be as fast as possible, so we’ll add an index to the Products table:

CREATE INDEX IDX_Products__Type 
  ON Products(Type) 
  INCLUDE (Id) 

With the index in place, we’ll modify our GetProductCounts() method to support filtering on product type. We can do this easily by passing in a nullable int containing the product type to filter on, or null if we don’t want to do any filtering.

The simplest way to write this is to check if the type is null directly in the filter expression:

public int GetProductCount(int? type)
{
    var count = db.Products.Count(p => type == null
                                    || p.Type == type.Value);
    return count;
}

Still pretty straightforward, and it does exactly what you would expect.

Unfortunately, it turns out that calls to GetProductCount() are far, far slower than they should be. Why? To figure that out, we’ll have to take a look at the SQL captured by a SQL profiler after calling GetProductCount(2):

DECLARE @p__linq__0 int = 2,
        @p__linq__1 int = 2;

SELECT 
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT 
        COUNT(1) AS [A1]
        FROM [dbo].[Products] AS [Extent1]
        WHERE (@p__linq__0 IS NULL) OR ([Extent1].[Type] = @p__linq__1) OR (([Extent1].[Type] IS NULL) AND (@p__linq__1 IS NULL))
    )  AS [GroupBy1];

There are four separate conditions in the WHERE clause. This is due to the EF context’s UseDatabaseNullSemantics flag being set to false by default. Setting it to true reduces the WHERE clause to the two conditions you might expect, but it doesn’t actually solve the performance problem. Let’s have a look at the query execution plan to see what’s actually happening:

index-scan

Yikes! It turns out we’re doing an index scan, not a seek. What effect does this have? A big one:

index-scan-2
Fortunately, it’s easy to modify our code to work around this and eliminate the scan. We’ll remove the null check from the expression in the Count() method, and instead, check for a null type outside of the query. If the type isn’t null, we’ll add the where clause to the query.

public int GetProductCount(int? type)
{
    var query = db.Products.AsQueryable();
    if(type != null)
        query = query.Where(p => p.Type == type.Value);

    var count = query.Count();
    return count;
}

The generated SQL looks pretty good:

DECLARE @p__linq__0 int = 2;

SELECT 
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT 
        COUNT(1) AS [A1]
        FROM [dbo].[Products] AS [Extent1]
        WHERE [Extent1].[Type] = @p__linq__0
    )  AS [GroupBy1];

And the query execution plan looks even better:

Index seek in the execution plan

You can see that compared to the Index Scan above, the Index Seek is literally an order of magnitude cheaper:

Index seek details

Now, this simple SELECT COUNT query is fast enough in and of itself that the difference isn’t very noticeable. But if you have a bigger, more complicated query, it can have a huge impact – one query I dealt with went from ~1 second to ~30 seconds due to this issue.

In conclusion, be careful when using expressions with parameters in your queries! They might have unintended performance consequences.

Entity Framework: Never call GroupBy().ToDictionary()

Entity Framework generally works pretty well, but every once in a while it’ll throw something strange at you. Here’s one I experienced a while ago that caused an unexpected performance hit.

ToDictionary And GroupBy Don’t Mix

Let’s say I have a table containing a bunch of products, and I want to find out how many products of each size there are. This is trivial in SQL – it’s a group by, selecting the count and the name. Now let’s say I want to stick the results into a Dictionary<string, int> for further processing. The code might look something like this:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                         .ToDictionary(p => p.Key, p => p.Count());

Not too bad, right? Here’s the generated SQL:

SELECT 
 [Project2].[C1] AS [C1], 
 [Project2].[Size] AS [Size], 
 [Project2].[C2] AS [C2], 
 [Project2].[Id] AS [Id], 
 [Project2].[Name] AS [Name], 
 [Project2].[Size1] AS [Size1]
 FROM ( SELECT 
 [Distinct1].[Size] AS [Size], 
 1 AS [C1], 
 [Extent2].[Id] AS [Id], 
 [Extent2].[Name] AS [Name], 
 [Extent2].[Size] AS [Size1], 
 CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
 FROM (SELECT DISTINCT 
 [Extent1].[Size] AS [Size]
 FROM [dbo].[Products] AS [Extent1] ) AS [Distinct1]
 LEFT OUTER JOIN [dbo].[Products] AS [Extent2] ON ([Distinct1].[Size] = [Extent2].[Size]) OR (([Distinct1].[Size] IS NULL) AND ([Extent2].[Size] IS NULL))
 ) AS [Project2]
 ORDER BY [Project2].[Size] ASC, [Project2].[C2] ASC;

Huh? What the heck is that? Where’s our GROUP BY? Where’s the COUNT? Calling ToDictionary definitely did NOT do what we expected it to.

Let’s try again. This time we’ll project the values we want onto an anonymous type, and then call ToDictionary on that, instead of on our grouping:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                     .Select(g => new { g.Key, Count = g.Count()})
                     .ToDictionary(p => p.Key, p => p.Count);

That gives us the following SQL:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [Size], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
        [Extent1].[Size] AS [K1], 
        COUNT(1) AS [A1]
        FROM [dbo].[Products] AS [Extent1]
        GROUP BY [Extent1].[Size]
    )  AS [GroupBy1];

Now that’s more like it!

But why?

ToDictionary() is an extension method on the Enumerable class in System.Linq. It doesn’t know anything about Entity Framework – all it can do is enumerate an IEnumerable. So when you call ToDictionary on your IQueryable<IGrouping<,>> that the GroupBy method returned, it simply materializes the query.

But why does Entity Framework generate such a strange query in the first place?

If you call GroupBy(…) and then enumerate the GroupBy(), Entity Framework has no way of knowing what you are going to do with the results. For all it knows, you are going to end up extracting the entities themselves from the GroupBy.

// I'm so sneaky... I'm gonna trick EF...or am i?
var productSizeCounts = db.Products.GroupBy(p => p.Size).ToList();
var firstProductWithEachSize = productSizeCounts.
                                  Select(p => p.FirstOrDefault());

If it had tried to be smart, and turned the query into the GROUP BY you were expecting, it wouldn’t have the necessary information available to materialize the entities you are asking for!

Explicitly selecting the fields you want lets EF generate the GROUP BY query you would expect, because it knows that’s all the information you’ll need.