33

I've been working for the first time with the Entity Framework in .NET, and have been writing LINQ queries in order to get information from my model. I would like to program in good habits from the beginning, so I've been doing research on the best way to write these queries, and get their results. Unfortunately, in browsing Stack Exchange, I've seem to have come across two conflicting explanations in how deferred/immediate execution works with LINQ:

  • A foreach causes the query to be executed in each iteration of the loop:

Demonstrated in question Slow foreach() on a LINQ query - ToList() boosts performance immensely - why is this? , the implication is that "ToList()" needs to be called in order to evaluate the query immediately, as the foreach is evaluating the query on the data source repeatedly, slowing down the operation considerably.

Another example is the question Foreaching through grouped linq results is incredibly slow, any tips? , where the accepted answer also implies that calling "ToList()" on the query will improve performance.

  • A foreach causes a query to be executed once, and is safe to use with LINQ

Demonstrated in question Does foreach execute the query only once? , the implication is that the foreach causes one enumeration to be established, and will not query the datasource each time.

Continued browsing of the site has turned up many questions where "repeated execution during a foreach loop" is the culprit of the performance concern, and plenty of other answers stating that a foreach will appropriately grab a single query from a datasource, which means that both explanations seem to have validity. If the "ToList()" hypothesis is incorrect (as most of the current answers as of 2013-06-05 1:51 PM EST seem to imply), where does this misconception come from? Is there one of these explanations that is accurate and one that isn't, or are there different circumstances that could cause a LINQ query to evaluate differently?

Edit: In addition to the accepted answer below, I've turned up the following question over on Programmers that very much helped my understanding of query execution, particularly the the pitfalls that could result in multiple datasource hits during a loop, which I think will be helpful for others interested in this question: https://softwareengineering.stackexchange.com/questions/178218/for-vs-foreach-vs-linq

Community
  • 1
  • 1
Mejwell
  • 716
  • 1
  • 6
  • 17
  • I suppose it would depend on what the query in the foreach is actually doing. – asawyer Jun 05 '13 at 17:29
  • `foreach` is usually the culprit when used with an `IEnumerable`/`IQueryable` for performance issues. – Aron Jun 05 '13 at 17:32
  • 1
    We'd really need a very specific example to be able to reason about it properly. `foreach` will only call `GetEnumerator` once... but if you execute that whole `foreach` loop multiple times, it will call `GetEnumerator` multiple times... – Jon Skeet Jun 05 '13 at 17:34
  • The linked question is dubious and I don't believe the accepted answer over there. – H H Jun 05 '13 at 17:34
  • The 1st liked question deals nothing with LINQ to entities and databases and is not relevant to your question. – Dennis Jun 05 '13 at 17:36
  • 2
    In response to the edited question: this has *nothing* to do with foreach, and *everything* to do with lazy execution of LINQ. You'd run into the exact same issue if you repeatedly used `.Contains()`, `.First()`, `.Single()` or anything else which causes the LINQ to execute. With objects it doesn't matter, but with database queries it does. Calling `ToList()` converts the query (if it is one) to objects, so later uses don't need to hit the database. – Bobson Jun 05 '13 at 18:05

8 Answers8

22

In general LINQ uses deferred execution. If you use methods like First() and FirstOrDefault() the query is executed immediately. When you do something like;

foreach(string s in MyObjects.Select(x => x.AStringProp))

The results are retrieved in a streaming manner, meaning one by one. Each time the iterator calls MoveNext the projection is applied to the next object. If you were to have a Where it would first apply the filter, then the projection.

If you do something like;

List<string> names = People.Select(x => x.Name).ToList();
foreach (string name in names)

Then I believe this is a wasteful operation. ToList() will force the query to be executed, enumerating the People list and applying the x => x.Name projection. Afterwards you will enumerate the list again. So unless you have a good reason to have the data in a list (rather than IEnumerale) you're just wasting CPU cycles.

Generally speaking using a LINQ query on the collection you're enumerating with a foreach will not have worse performance than any other similar and practical options.

Also it's worth noting that people implementing LINQ providers are encouraged to make the common methods work as they do in the Microsoft provided providers but they're not required to. If I were to go write a LINQ to HTML or LINQ to My Proprietary Data Format provider there would be no guarantee that it behaves in this manner. Perhaps the nature of the data would make immediate execution the only practical option.

Also, final edit; if you're interested in this Jon Skeet's C# In Depth is very informative and a great read. My answer summarizes a few pages of the book (hopefully with reasonable accuracy) but if you want more details on how LINQ works under the covers, it's a good place to look.

Servy
  • 202,030
  • 26
  • 332
  • 449
evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115
  • Thanks for the book recommendation. This seems to confirm what my own fiddling around and the general consensus of the articles I'm turning up seems to be. – Mejwell Jun 05 '13 at 18:07
  • 1
    Note that the projection is *not* applies when you call `Current` on the enumerator, it's applied when you call `MoveNext`. `Current` is just fetching the value that was generated on the call to `MoveNext`. This is good, it means you don't need to worry about using `Current` more than once between calls to `MoveNext`. – Servy Jun 05 '13 at 18:17
  • @Servy thank you for the correction. Feel free to edit the post if you'd like. – evanmcdonnal Jun 05 '13 at 19:37
  • what if the LINQ statement uses OrderBy or similar which enumerates the whole set? e.g. foreach (var thing in things.OrderBy(r => r.Order).ToArray()) does that execute once or once per iteratation in the for loop? – Steve Nov 17 '16 at 12:23
  • I believe you are wrong about the "wasteful operation". If you look at my answer to the question, you can see the the enumeration happens twice either way. – Boregore Jan 29 '18 at 13:05
  • @Boregore look at your answer a bit closer... In the first example the print statements are interweaved. Why do you think that is? Somehow it's executing a line of code from the body of your `Where` followed by one in the `ForEach` then one in the `Where` again. You're misunderstanding your test. – evanmcdonnal Jan 29 '18 at 16:42
  • @evanmcdonnal sorry for the late answer - lost my ability to comment, when you downvoted my solution :P Would you care to elaborate? If you look at the numbers printed in my solution, you can see that the `Where` is executed the same number of times in both scenarios - the collection is just enumerated by the LINQ query interveawed with the `foreach` of the projected values in the scenario where I DON'T do `ToList()`; in the scenario where I DO invoke `ToList()` the collection is enumerated instantly by the LINQ query and afterwards the projection is enumerated by the `foreach`. – Boregore Mar 07 '18 at 12:16
9

try this on LinqPad

void Main()
{
    var testList = Enumerable.Range(1,10);
    var query = testList.Where(x => 
    {
        Console.WriteLine(string.Format("Doing where on {0}", x));
        return x % 2 == 0;
    });
    Console.WriteLine("First foreach starting");
    foreach(var i in query)
    {
        Console.WriteLine(string.Format("Foreached where on {0}", i));
    }

    Console.WriteLine("First foreach ending");
    Console.WriteLine("Second foreach starting");
    foreach(var i in query)
    {
        Console.WriteLine(string.Format("Foreached where on {0} for the second time.", i));
    }
    Console.WriteLine("Second foreach ending");
}

Each time the where delegate is being run we shall see a console output, hence we can see the Linq query being run each time. Now by looking at the console output we see the second foreach loop still causes the "Doing where on" to print, thus showing that the second usage of foreach does in fact cause the where clause to run again...potentially causing a slow down.

First foreach starting
Doing where on 1
Doing where on 2
Foreached where on 2
Doing where on 3
Doing where on 4
Foreached where on 4
Doing where on 5
Doing where on 6
Foreached where on 6
Doing where on 7
Doing where on 8
Foreached where on 8
Doing where on 9
Doing where on 10
Foreached where on 10
First foreach ending
Second foreach starting
Doing where on 1
Doing where on 2
Foreached where on 2 for the second time.
Doing where on 3
Doing where on 4
Foreached where on 4 for the second time.
Doing where on 5
Doing where on 6
Foreached where on 6 for the second time.
Doing where on 7
Doing where on 8
Foreached where on 8 for the second time.
Doing where on 9
Doing where on 10
Foreached where on 10 for the second time.
Second foreach ending
ofthelit
  • 1,341
  • 14
  • 33
Aron
  • 15,464
  • 3
  • 31
  • 64
  • 3
    Please describe what this is supposed to demonstrate in your answer. – Bobson Jun 05 '13 at 17:40
  • Each time the where delegate is being run we shall see a console output, hence we can see the Linq query being run each time. Now by looking at the console output we see the second foreach loop still causes the "Doing where on" to print, thus showing that the second usage of foreach does in fact cause the where clause to run again...potentially causing a slow down. – Aron Jun 05 '13 at 17:45
  • 2
    You should include that *in the answer*, rather than just saying "Try this". – Bobson Jun 05 '13 at 17:46
  • I just thought it was somewhat self explanatory what was going on... I do try my best to describe what I am doing IN the code rather than with comments... – Aron Jun 05 '13 at 17:48
  • It doesn't need to be described in comments in the code. It just needed a sentence or two saying *why* you're suggesting running this, so people know if it's worth trying it. Thanks for adding it - Downvote removed. – Bobson Jun 05 '13 at 17:51
  • "hence we can see the Linq query being run each time." each time of the 2 => yes, but not at each iteration of 1 loop apparently. You example shows nicely, that the entire collection is iterated only once by the `Where` query. – Mong Zhu Jul 02 '19 at 07:51
6

It depends on how the Linq query is being used.

var q = {some linq query here}

while (true)
{
    foreach(var item in q)
    {
    ...
    }
}

The code above will execute the Linq query multiple times. Not because of the foreach, but because the foreach is inside another loop, so the foreach itself is being executed multiple times.

If all consumers of a linq query use it "carefully" and avoid dumb mistakes such as the nested loops above, then a linq query should not be executed multiple times needlessly.

There are occasions when reducing a linq query to an in-memory result set using ToList() are warranted, but in my opinion ToList() is used far, far too often. ToList() almost always becomes a poison pill whenever large data is involved, because it forces the entire result set (potentially millions of rows) to be pulled into memory and cached, even if the outermost consumer/enumerator only needs 10 rows. Avoid ToList() unless you have a very specific justification and you know your data will never be large.

dthorpe
  • 35,318
  • 5
  • 75
  • 119
4

Sometimes it might be a good idea to "cache" a LINQ query using ToList() or ToArray(), if the query is being accessed multiple times in your code.

But keep in mind that "caching" it still calls a foreach in turn.

So the basic rule for me is:

  • if a query is simply used in one foreach (and thats it) - then I don't cache the query
  • if a query is used in a foreach and in some other places in the code - then I cache it in a var using ToList/ToArray
jazzcat
  • 4,351
  • 5
  • 36
  • 37
3

foreach, by itself, only runs through its data once. In fact, it specifically runs through it once. You can't look ahead or back, or alter the index the way you can with a for loop.

However, if you have multiple foreachs in your code, all operating on the same LINQ query, you may get the query executed multiple times. This is entirely dependent on the data, though. If you're iterating over an LINQ-based IEnumerable/IQueryable that represents a database query, it will run that query each time. If you're iterating over an List or other collection of objets, it will run through the list each time, but won't hit your database repeatedly.

In other words, this is a property of LINQ, not a property of foreach.

Bobson
  • 13,498
  • 5
  • 55
  • 80
1

The difference is in the underlying type. As LINQ is built on top of IEnumerable (or IQueryable) the same LINQ operator may have completely different performance characteristics.

A List will always be quick to respond, but it takes an upfront effort to build a list.

An iterator is also IEnumerable and may employ any algorithm every time it fetches the "next" item. This will be faster if you don't actually need to go through the complete set of items.

You can turn any IEnumerable into a list by calling ToList() on it and storing the resulting list in a local variable. This is advisable if

  • You don't depend on deferred execution.
  • You have to access more total items than the whole set.
  • You can pay the upfront cost of retrieving and storing all items.
Tormod
  • 4,551
  • 2
  • 28
  • 50
0

Using LINQ even without entities what you will get is that deferred execution is in effect. It is only by forcing an iteration that the actual linq expression is evaluated. In that sense each time you use the linq expression it is going to be evaluated.

Now with entities this is still the same, but there is just more functionality at work here. When the entity framework sees the expression for the first time, it looks if he has executed this query already. If not, it will go to the database and fetch the data, setup its internal memory model and return the data to you. If the entity framework sees it already fetched the data beforehand, it is not going to go to the database and use the memory model that it setup earlier to return data to you.

This can make your life easier, but it can also be a pain. For instance if you request all records from a table by using a linq expression. The entity framework will load all data from the table. If later on you evaluate the same linq expression, even if in the time being records were deleted or added, you will get the same result.

The entity framework is a complicated thing. There are of course ways to make it reexecute the query, taking into account the changes it has in its own memory model and the like.

I suggest reading "programming entity framework" of Julia Lerman. It addresses lots of issues like the one you having right now.

Philip Stuyck
  • 7,344
  • 3
  • 28
  • 39
-1

It will execute the LINQ statement the same number of times no matter if you do .ToList() or not. I have an example here with colored output to the console:

What happens in the code (see code at the bottom):

  • Create a list of 100 ints (0-99).
  • Create a LINQ statement that prints every int from the list followed by two * to the console in red color, and then return the int if it's an even number.
  • Do a foreach on the query, printing out every even number in green color.
  • Do a foreach on the query.ToList(), printing out every even number in green color.

As you can see in the output below, the number of ints written to the console is the same, meaning the LINQ statement is executed the same number of times.

The difference is in when the statement is executed. As you can see, when you do a foreach on the query (that you have not invoked .ToList() on), the list and the IEnumerable object, returned from the LINQ statement, are enumerated at the same time.

When you cache the list first, they are enumerated separately, but still the same amount of times.

The difference is very important to understand, because if the list is modified after you have defined your LINQ statement, the LINQ statement will operate on the modified list when it is executed (e.g. by .ToList()). BUT if you force execution of the LINQ statement (.ToList()) and then modify the list afterwards, the LINQ statement will NOT work on the modified list.

Here's the output: LINQ Deferred Execution output

Here's my code:

// Main method:
static void Main(string[] args)
{
    IEnumerable<int> ints = Enumerable.Range(0, 100);

    var query = ints.Where(x =>
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.Write($"{x}**, ");
        return x % 2 == 0;
    });

    DoForeach(query, "query");
    DoForeach(query, "query.ToList()");

    Console.ForegroundColor = ConsoleColor.White;
}

// DoForeach method:
private static void DoForeach(IEnumerable<int> collection, string collectionName)
{
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine("\n--- {0} FOREACH BEGIN: ---", collectionName);

    if (collectionName.Contains("query.ToList()"))
        collection = collection.ToList();

    foreach (var item in collection)
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Console.Write($"{item}, ");
    }

    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine("\n--- {0} FOREACH END ---", collectionName);
}

Note about execution time: I did a few timing tests (not enough to post it here though) and I didn't find any consistency in either method being faster than the other (including the execution of .ToList() in the timing). On larger collections, caching the collection first and then iterating it seemed a bit faster, but there was no definitive conclusion from my test.

Boregore
  • 207
  • 1
  • 11