Sep 30, 2009

Speeding up LINQ .Contains Queries with an Extension Method

Ive been working on some reporting forms for the company i work for.  Ive been using LINQ to filter the data, and ran in to some, well actually ALOT of slowdown when trying to filter a bunch of Int32.  The query typically looked something like ths:
var someHugeListOfIds = myData.Pages
.Select(p=>p.ParentSessionId);

var someSessions = myData.Sessions
.Where(s=>someHugeListOfIds.Contains(s.Id));

foreach(var session in someSessions)
DoSomething(someSession)



I noticed that when the count of the someHugeListOfIds got to be greater than one hundred thousand (maybe even less, but didnt test) the execution of the foreach statement was incredibly slow.  The solution?  Convert that huge list of Ids in to a HashSet<T>.  You can do this by hand, or you can write an extension method like the one below:
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
{
HashSet<T> hashSet = new HashSet<T>();

foreach (var en in enumerable)
{
if (!hashSet.Contains(en))
hashSet.Add(en);
}

return hashSet;
}

Now I can re-write the query to the one below and instead of 10s of seconds for the query to complete, its done in a few milliseconds!
var someHugeListOfIds = myData.Pages
.Select(p=>p.ParentSessionId)
.ToHashSet();

var someSessions = myData.Sessions
.Where(s=>someHugeListOfIds.Contains(s.Id));

foreach(var session in someSessions)
DoSomething(someSession)

1 comment: