13

Is VB.NET's Aggregate query fatally flawed when used as the first (outer) clause of a Linq expression with multiple Into clauses because each Into clause is executed separately?

The "obvious" answer to SELECT MIN(ZoneMin), MAX(ZoneMin) FROM Plant in LINQ to SQL is

Dim limits = Aggregate p In Plants Select p.ZoneMin Into Min(), Max()

However, this answer actually retrieves each of Min and Max (and if you include other aggregate functions like Count and Average) in separate SQL queries. This can be easily seen in LINQPad.

Is there a transaction (or something else making these queries atomic) not shown by LINQPad, or is this a race condition waiting to happen? (And so you have to do the tricks shown in the answer to the above question to force a single query that returns multiple aggregates.)

In summary, is there a LINQ-to-SQL query using Aggregate that returns multiple aggregate functions in a single (or at least "atomic") query?

(I also say "obvious" because the obvious answer to me, Aggregate p In Plants Into Min(p.ZoneMin), Max(p.ZoneMin), actually retrieves the whole table twice, even when optimised, and then uses the Linq-to-Entities Min and Max to obtain the result :-( )

I thought Aggregate wasn't VB-specific, but it looks like C# does not have this query expression, so I've changed the to .

Community
  • 1
  • 1
Mark Hurd
  • 10,665
  • 10
  • 68
  • 101
  • You might be able to verify if the queries are in a transaction by tracing the SQL server itself, it should show it (but I'm not 100% sure about it). If performance is not a prime concern, however, you're better off reading the whole dataset and aggregating afterwards (basically, working on a snapshot of the data). – Alex Sep 04 '12 at 14:28
  • Note that that's what I hoped my original `Into Min(p.ZoneMin)` query would optimise to, seeing as the SQL part of the query is now the same. Perhaps the JITter might be able to still see that, but LINQPad's /o+ didn't. – Mark Hurd Sep 04 '12 at 14:38
  • And while you're only aggregating with functions that SQL supports it will be preferable to do the "grouping by a constant trick" to get the database to do the aggregation. – Mark Hurd Sep 04 '12 at 14:43
  • 1
    It appears that there may be some bugs in the L2S implementation. See http://weblogs.asp.net/fbouma/archive/2008/05/21/vb-net-beware-of-the-aggregate-keyword.aspx – Jim Wooley Sep 04 '12 at 14:55
  • @JimWooley: If you up that comment to an answer, I'll upvote it (just not straight away because it doesn't answer _this_ question) and possibly accept it as the answer unless someone identifies a way to get `Aggregate` to combine all the queries into a single SQL request. – Mark Hurd Sep 04 '12 at 15:36

2 Answers2

1

Although it doesn't use the Aggregate keyword, you can do multiple functions in a single query using the following syntax:

Dim query = From book In books _
    Group By key = book.Subject Into Group _
    Select id = key, _
        BookCount = Group.Count, _
        TotalPrice = Group.Sum(Function(_book) _book.Price), _
        LowPrice = Group.Min(Function(_book) _book.Price), _
        HighPrice = Group.Max(Function(_book) _book.Price), _
        AveragePrice = Group.Average(Function(_book) _book.Price)

There does appear to be an issue with the Aggregate clause implementation though. Consider the following query from Northwind:

Aggregate o in Orders
into Sum(o.Freight),
Average(o.Freight),
Max(o.Freight)

This issues 3 database requests. The first two perform separate aggregate clauses. The third pulls the entire table back to the client and performs the Max on the client through Linq to Objects.

Jim Wooley
  • 10,169
  • 1
  • 25
  • 43
  • 1
    Yeah, and if you change it to `Group By key = 0 Into Group` (and drop the unneeded `id = key,`) you get the "grouping by a constant" trick described in the answer to the question I've referred to, allowing aggregates over the whole table. – Mark Hurd Sep 04 '12 at 15:01
  • In my sample, I do want the grouping. If you want it over the whole table, then your constant trick works fine. – Jim Wooley Sep 04 '12 at 15:09
  • I *think* your latter point is related to the type `Freight` is seen to be, and for some reason Linq is of the opinion SQL's `MAX` could return a different value to what it thinks `Max` should return. But the _3 __separate__ database requests_ is what my question is concerned with. – Mark Hurd Sep 04 '12 at 15:25
  • Also note your `Freight` query may somehow be related to the issue Frans Bouma reports in his blog linked to by Jim Wooley. – Mark Hurd Sep 04 '12 at 15:44
  • 1
    Um, yeah. I'm a little familiar with those posts ;-) My [post](http://www.thinqlinq.com/Post.aspx/Title/Aggregate-clause-issues) is in part a reminder to the teams that it is still an issue. – Jim Wooley Sep 04 '12 at 16:30
  • Whoops, yeah I missed you're the same person :-) – Mark Hurd Sep 04 '12 at 16:42
0

To answer my broader question: is Aggregate broken for producing separate SQL queries without transactions?

All of LINQ can cause that if you don't carefully adjust your queries to only result in a single SELECT, and that may not be possible, without "giving up", retrieving a larger result in a single query and then using Linq-to-Objects to aggregate or otherwise manipulate the data. This 'query' for example.

So it is, in general, up to the programmer to ensure transactions are added around LINQ queries that may cause multiple queries. We just need to know for sure which LINQ queries may transform into multiple SQL queries.

Community
  • 1
  • 1
Mark Hurd
  • 10,665
  • 10
  • 68
  • 101