Entity Framework: Never call GroupBy().ToDictionary()

Entity Framework generally works pretty well, but every once in a while it’ll throw something strange at you. Here’s one I experienced a while ago that caused an unexpected performance hit.

ToDictionary And GroupBy Don’t Mix

Let’s say I have a table containing a bunch of products, and I want to find out how many products of each size there are. This is trivial in SQL – it’s a group by, selecting the count and the name. Now let’s say I want to stick the results into a Dictionary<string, int> for further processing. The code might look something like this:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                         .ToDictionary(p => p.Key, p => p.Count());

Not too bad, right? Here’s the generated SQL:

SELECT 
 [Project2].[C1] AS [C1], 
 [Project2].[Size] AS [Size], 
 [Project2].[C2] AS [C2], 
 [Project2].[Id] AS [Id], 
 [Project2].[Name] AS [Name], 
 [Project2].[Size1] AS [Size1]
 FROM ( SELECT 
 [Distinct1].[Size] AS [Size], 
 1 AS [C1], 
 [Extent2].[Id] AS [Id], 
 [Extent2].[Name] AS [Name], 
 [Extent2].[Size] AS [Size1], 
 CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
 FROM (SELECT DISTINCT 
 [Extent1].[Size] AS [Size]
 FROM [dbo].[Products] AS [Extent1] ) AS [Distinct1]
 LEFT OUTER JOIN [dbo].[Products] AS [Extent2] ON ([Distinct1].[Size] = [Extent2].[Size]) OR (([Distinct1].[Size] IS NULL) AND ([Extent2].[Size] IS NULL))
 ) AS [Project2]
 ORDER BY [Project2].[Size] ASC, [Project2].[C2] ASC;

Huh? What the heck is that? Where’s our GROUP BY? Where’s the COUNT? Calling ToDictionary definitely did NOT do what we expected it to.

Let’s try again. This time we’ll project the values we want onto an anonymous type, and then call ToDictionary on that, instead of on our grouping:

 var productSizeCounts = db.Products.GroupBy(p => p.Size)
                     .Select(g => new { g.Key, Count = g.Count()})
                     .ToDictionary(p => p.Key, p => p.Count);

That gives us the following SQL:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [Size], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
        [Extent1].[Size] AS [K1], 
        COUNT(1) AS [A1]
        FROM [dbo].[Products] AS [Extent1]
        GROUP BY [Extent1].[Size]
    )  AS [GroupBy1];

Now that’s more like it!

But why?

ToDictionary() is an extension method on the Enumerable class in System.Linq. It doesn’t know anything about Entity Framework – all it can do is enumerate an IEnumerable. So when you call ToDictionary on your IQueryable<IGrouping<,>> that the GroupBy method returned, it simply materializes the query.

But why does Entity Framework generate such a strange query in the first place?

If you call GroupBy(…) and then enumerate the GroupBy(), Entity Framework has no way of knowing what you are going to do with the results. For all it knows, you are going to end up extracting the entities themselves from the GroupBy.

// I'm so sneaky... I'm gonna trick EF...or am i?
var productSizeCounts = db.Products.GroupBy(p => p.Size).ToList();
var firstProductWithEachSize = productSizeCounts.
                                  Select(p => p.FirstOrDefault());

If it had tried to be smart, and turned the query into the GROUP BY you were expecting, it wouldn’t have the necessary information available to materialize the entities you are asking for!

Explicitly selecting the fields you want lets EF generate the GROUP BY query you would expect, because it knows that’s all the information you’ll need.

Announcing ReminderHero.com – Automatic Email and SMS Reminders

You may have noticed that the volume of posts – and in particular, iOS-related posts – has slowed down over the past month or so. I’ve been busy working on a side project which is now in the public beta phase, and I’d love your feedback!

Reminder Hero is a web app that lets you create intelligent reminders simply by sending an email or a text message. Just describe what you want to be reminded of and when – in plain english. Reminder Hero will email or text you back a reminder when it’s time!

Here are some examples of the kind of things you might ask Reminder Hero to remind you about:

Call Bob in 2h

Pay my credit card bill Tuesday night

Take out the trash every Friday at 8am

So, head on over to http://reminderhero.com, check out the site, and subscribe to the Beta mailing list! Don’t forget to send us your feedback!

“Could not load file or assembly FSharp.Core, Version=4.0.0.0” In An ASP.NET Application

If you’ve ever tried to use an F# class library from an ASP.NET application, you may find that although things work perfectly when you are running locally, your web app won’t load at all in production.

Could not load file or assembly ‘FSharp.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies. The system cannot find the file specified.

Fortunately, the fix is easy. All you need to do is add a reference to FSharp.Core to your web project, and set its Copy Local property to True. Not familiar with how to do that? Here’s a quick overview.

First, go to the Solution Explorer and right-click on your web project, or on your web project’s References folder, and select Add Reference…

add-reference

In the Reference Manager window, select Assemblies, and the find the FSharp.Core, version 4.3 reference. Check it, and click OK.

add-fsharp-dot-core-reference

 

Finally, in the Solution Explorer, find the FSharp.Core reference in your project’s References folder. Select it, then open the Properties window (or tab, or pane, depending on how you have Visual Studio configured), and set Copy Local to True.

copy-local

That’s it! Deploy your web app to your server, and the error will be gone.

What do YOU want to see in ios-queryable?

ios-queryable is a library that simplifies writing queries in Core Data. I released it around 9 months ago, and it has become reasonably popular since then, with a hundred-something stars and a few forks on GitHub.

At this point, it’s basically a complete, working, well-tested library, and it does pretty much everything I originally set out to accomplish. However, I don’t want to simply stop working on it; I want it to grow and evolve and become even more powerful. So I’m asking you – the community – for your input. What do you want to see in it? Where do you want to see it go?

Here are some possibilities:

  • Support for blocks (i.e. passing in a block to the where: method)
  • More IEnumerable methods – GroupBy, Join, ToDictionary, Aggregate, etc.
  • Direct Sqlite support (that is, no dependency on Core Data. Sounds like a lot of work…)
  • OS X support (I’ve never done OS X-specific development, so I’m not sure what’s involved here; maybe it already works!)
  • Improved error handling (since, you know, there isn’t really any at the moment…)

Do you have any other ideas, or any thoughts on these ideas? Leave a comment and let me know!

Like Programming? Like Fitness? Why Not Combine Them!

If you’re interested in programming and/or fitness, you should head over to http://fitnessprogrammer.com, where my wife has started her blogging journey.

She’ll be blogging about her experiences with learning to program, starting with Codecademy.com and JavaScript. She’ll also be blogging about fitness and health, and her love for all things Beach Body.

Her goal is to eventually release her own iPhone apps, so head over there now and start reading about her experiences! If you’re interested in learning how to program, it’s your chance to see somebody become a programmer from the ground up.

Don’t forget to leave her a comment – she’ll appreciate it!

 

trakt.tv’s API – The Good And The Bad

My most popular app, Which Episode?, is powered by trakt.tv‘s API. It uses the API to search for shows when you enter a name, and to download show posters.

which-episode

It takes exactly one API call to grab all the info I need. If the user enters in ‘Supernatural’, the app simply makes an HTTP request to http://api.trakt.tv/search/shows.json/[my API key]/supernatural. The resulting JSON includes everything about shows matching the search terms, including URLs for the show’s posters.

This is fantastic, and it’s exactly what I would expect from the API.

Unfortunately, the cracks in trakt’s API begin to show once you start looking at more advanced scenarios.
The app I’m currently working on requires the retrieval of information on recently released movies, or movies that are currently playing in the theater. Figuring out how to retrieve this info from trakt is an exercise in futility.

You could try to retrieve http://trakt.tv/api-docs/movies-trending to get a list of trending movies… but just because a movie is trending doesn’t mean it’s actually new, so you’ll have to filter the results based on whether or not they are actually new. More importantly, just because a movie is new, doesn’t mean it’ll be trending, so you’re not actually going to get all the movies you want to. It’s also a nearly 200K chunk of JSON – not something you want to be grabbing on a regular basis from a mobile device.

You could also try to keep a local copy of all the (recent) movies in the database using the http://trakt.tv/api-docs/movies-updated endpoint. This call gives you the timestamps of when the record for each movie was updated. Using this, along with a boatload of API calls (you’ll need one for each movie), you should be able to keep track of new movies and their release dates.

Both of these methods are, to put it mildly, inadequate. And they reveal the biggest problem with trakt’s API:

It doesn’t support server-side sorting, pagination, or result limiting!

Pagination was requested on trakt’s Google Group a few months ago, but the thread has no responses. I asked a question about retrieving movies by release date a week ago, and my thread has had no responses.

If I could retrieve the top n movies, sorted by release date, life would be great. It’s not even a breaking change from an API design point of view – all it takes is support for a few extra parameters in the query string. I could envision using it like this:

http://api.trakt.tv/something/somethingelse.json/[my API key]?sort=release_date&count=20

The lack of support for such things is why I’ll be switching away from trakt for my upcoming app. It’s too bad, because aside from these limitations, trakt has been fantastic.

One of the alternatives I’m investigating is the Rotten Tomatoes API. I haven’t dug too deeply, but it appears that it has far more functionality than trakt – for example, individual endpoints for Box Office Movies, Upcoming Movies, In Theater Movies, and Opening Movies. Pretty slick!

Here’s hoping the rest of it works as well as trakt does.