IFormattable: linq

Showing posts with label linq. Show all posts

Monday, May 5, 2008

Using LINQ from .NET 2.0

A few months back when I was migrating a project to VS 2008, the PM was very concerned that .NET 3.5 would cause problems on our servers, interfering with existing applications or creating new bugs in the existing codebase. I did my best to educate my PM on the nature of .NET 3.5, that it still ran on the .NET 2.0 CLR, that it was a bunch of new libraries, that it include SDK tools that compiled the new C# 3.0 language, that it wouldn't affect existing applications. Ever the pragmatic fellow, he insisted on late night installs and system testing, nevertheless.

Now, I don't mind pragmatism, in fact, I applaud it; however, a different situation can arise than the one I just described. A non-technical PM or a draconian operations manager might insist that you don't "use .NET 3.5" and stick with ".NET 2.0". If you take this at face value, it means you can't use C# 3.0, LINQ, etc. Well, first you must get them into a Slammer worm recovery group. Next, you can still use some .NET 3.5 goodness without having to install it on the production server.

Here's a minimalist approach: include System.Core.dll in your .NET 2.0 project. You may already know that you can build strictly for .NET 2.0 from VS 2008, so just set the reference, set it to copy local, and roll. You'll have to convince them to let you install .NET 2.0 SP1, though. Another thing you could try is LINQBridge, a .NET 2.0 re-implementation of LINQ to Objects along with Action<T> and Func<T>.

At the end of the day, you should probably just stay within the environment's strictures. If you can make the case that .NET 3.5 will save time and money, you'll be much more successful in changing minds than if you just think it is cool.

Saturday, January 12, 2008

List of Links for Learning LINQ

Since I often use this blog as a brain back-up device. I will add another list of links, this time for learning LINQ.

Building a LINQ Provider

Links to LINQ (another link-post)

How LINQ Works: Creating Queries (part of a series of great posts on LINQ internals)

The .NET Standard Query Operators (slightly out-of-date, but very relevant)

Standard Query Operator Translation (LINQ to SQL)

Friday, December 21, 2007

Liking LINQ: A Question of Efficacy

Consider these two equivalent blocks of code. First, the LINQ way:

foreach(var assignable in (from Assembly a in BuildManager.GetReferencedAssemblies()
                           select a.GetTypes() into types
                           from t1 in types
                           where typeof(I).IsAssignableFrom(t1)
                           select t1))
         Cache.AddType(assignable.Name, Cache.ContainsType(assignable.Name) ? DUPLICATE_TYPE : assignable);

And, the "normal way":

foreach (Assembly assembly in BuildManager.GetReferencedAssemblies())
     foreach (Type type in assembly.GetTypes())
          if (typeof(I).IsAssignableFrom(type))
          {
                if (!Cache.ContainsType(type.Name))
                      Cache.AddType(type.Name, type);
                else
                      Cache.AddType(type.Name, DUPLICATE_TYPE);
          }

I cannot rightly make a judgement call on which way is better. They have the same result. Though, it is safe to say that normal way should perform better. It is also clear, oddly, that the normal way has fewer lines of meaningful code; thus, it is easier to grok. So, what do you think? If you had to maintain the code base, which one would you prefer. FWIW, I'm going to go with the normal way.

This leaves me with the question of efficacy. Certainly there are niches where LINQ is superior, or the only option, but for general object collection work, should we just ignore this language feature? Perhaps when PLINQ comes available we'll have a good reason to use it. Time will tell.

Sunday, December 16, 2007

Liking LINQ: The Learning Curve

This is the second post in a series on Language Integrated Query. I'm pushing LINQ's buttons and bumping into some of its boundaries.

Every abstraction leaks at some point, and LINQ to SQL is no exception. Consider the following code:

NorthwindDataContext d = new NorthwindDataContext(); 
int? quantityThreshold = null; 
var sales = from p in d.Products 
join od in d.Order_Details on p.ProductID equals od.ProductID 
where !p.Discontinued && (quantityThreshold.HasValue ? od.Quantity >= quantityThreshold.Value : true) 
select p;

So, when you begin fetching data out of "sales" you'll see the problem. A run-time error is thrown because the expression tree visitor attempts to greedily evaluate quantityThreshold.Value. Let's try to move the evaluation out of the LINQ expression.

Predicate<Order_Detail> hasSufficientQuantity = o => quantityThreshold.HasValue ? o.Quantity >= quantityThreshold : true;
var sales = from p in d.Products
            join od in d.Order_Details on p.ProductID equals od.ProductID
            where !p.Discontinued && hasSufficientQuantity.Invoke(od)
            select p;

Well, that doesn't work either. "The method or operation is not implemented." The expression tree visitor has no idea what this hasSufficientQuantity method is... changing it to hasSufficientQuantity.Invoke(od) reveals that we are barking up the wrong tree, no pun intended. The error given then is that our Predicate function cannot be translated. Okay... let's look at why.

This fun LINQ expression syntax in C# is just syntactic sugar for a bunch of extension methods with signatures so jam-packed with Generics, you'd think it was Wal-Mart. So, we are grateful to our C# language team for the sugar. But, it does tend to hide what is really going on, making it difficult to figure out why the syntax seems so finicky. Our LINQ expression above would translate into imperative code similar to the following:

var sales = d.Products.Join(d.Order_Details, p => p.ProductID, o => o.ProductID, (p, o) => new { Product = p, Order_Detail = o }).Where(p => !p.Product.Discontinued && hasSufficientQuantity.Invoke(p.Order_Detail)).Select(p => p.Product);

This isn't exactly pretty, and it doesn't really help us to understand why our function can't be translate, or does it? Consider what these function calls are doing. They are taking arguments, primarily Func<...> objects, and storing them internal in an expression tree. We know from stepping through the code that the execution of our supplied Func<...> objects (the lambda expressions above) is deferred until we start accessing values from "sales". So, there must be some internal storage of our intent. Further, the code above must be translated to SQL by the System.Data.Linq libraries, and we can gather from they call stack on our exception that they are using the Visitor pattern to translate the nodes of the expression tree into SQL statements.

What happens when they visit the node that calls invokes the hasSufficientQuantity Predicate? Well, that code--the Preciate object instance itself--is not available in SQL, so the translation fails. This seems obvious, but consider that if we were using LINQ to Objects here, any of these approaches would work fine, as the predicate would be available in the execution environment of the translated expression tree, where it wasn't for SQL.

This is a contrived example, of course, and we could "code around" this in any number of ways, e.g.

where !p.Discontinued && od.Quantity >= (quantityThreshold ?? 0)

However, we are still seeing the LINQ to SQL abstraction leak pretty severely.

There are some gotchas out there as well, of course. Consider the following SQL statement that answers the question, "How many orders have my customers had for each of my products?"

SELECT o.CustomerID, od.ProductID, COUNT(*) as [Number of Orders] 
FROM dbo.Orders o JOIN dbo.[Order Details] od 
    ON o.OrderID = od.OrderID 
GROUP BY od.ProductID, o.CustomerID

How might we attempt to answer the same question with LINQ to SQL? Notice that we are specifying two columns to group by in our query. Here's what we might like to write in LINQ:

NorthwindDataContext d = new NorthwindDataContext();
var results = from o in d.Orders
              join od in d.Order_Details on o.OrderID equals od.OrderID
              group by o.CustomerID, od.ProductID into g
              select new {g.CustomerID, g.ProductID, g.Count()};

Of course, this doesn't even come close to compiling. Here's the right way to do use multiple columns in a groupby: use a tuple!

var results = from od in d.Order_Details
              group od by new {od.Order.CustomerID, od.ProductID} into orders
              select new { orders.Key.CustomerID, orders.Key.ProductID, NumberOfOrders = orders.Count() };

Once you start getting the gestalt of LINQ, you'll find yourself creating tuples all over the place. Consider this query expression to retrieve the total sales of each product in each territory:

var territorySales = from p in d.Products
                     join od in d.Order_Details on p.ProductID equals od.ProductID
                     join o in d.Orders on od.OrderID equals o.OrderID
                     join e in d.Employees on o.EmployeeID equals e.EmployeeID
                     join et in d.EmployeeTerritories on e.EmployeeID equals et.EmployeeID
                     join t in d.Territories on et.TerritoryID equals t.TerritoryID
                     where !p.Discontinued
                     group new { od.ProductID, p.ProductName, t.TerritoryID, t.TerritoryDescription, od.Quantity } 
                        by new { od.ProductID, t.TerritoryID, p.ProductName, t.TerritoryDescription } into sales
                     orderby sales.Key.TerritoryDescription descending, sales.Key.ProductName descending
                     select new { Product = sales.Key.ProductName.Trim(), Territory = sales.Key.TerritoryDescription.Trim(), TotalSold = sales.Sum(s => s.Quantity) };

The interesting part of that expression is that I created a tuple in my group...by to "select" the data to pass on to the next expression.

What if what we really wanted were the top ten best-selling products in each territory? Well, there's no "top" LINQ query expression keyword. The standard query operators include a couple of methods that look interesting: Take(int) and TakeWhile(predicate). Unfortunately, TakeWhile is among the standard query operators that is not supported in LINQ to SQL. Why? Well, it's because you couldn't write equivalent SQL, I imagine. And, while Take(int) is supported, its not immediately useful in a situation like this where you want to apply it to subsets of your results. Therefore, a more procedural result seems warranted. I'll investigate this further in my next post on the topic.

It is interesting to note the situation that arises with certain standard LINQ query operators not being supported by various flavors of LINQ. Because the standard query operators are implemented using extension methods, every LINQ provider must handle them all, including those they cannot support. This means throwing the NotSupportedException from the implementation of those methods. The System.Linq.Queryable static class is where the standard query operators are implemented, defining the operators on IQueryable<T>. LINQ to SQL classes like Table implement this interface, as do all class that participate in LINQ expressions.

Despite using the same syntax, the LINQ providers will each have their own significant learning curve due to variations in the operators they support and their own quirks. Next time we'll try to implement a top(x) query in LINQ to SQL.

Monday, November 12, 2007

On Liking LINQ

This past Sunday marked Veteran's Day on the calendar. I observed the holiday in true patriotic fashion: by working.

Truthfully, I spent the day working through the LINQ hands-on lab. You can find all the links you'd need to get started with LINQ and the VS2008 Beta2 release at Charlie Calvert's blog.

First Impressions

It didn't take me long to get used to the new syntax, though I think my experience with JavaScript and Erlang might account for that. I absolutely love the inclusion of lamda expressions; these are huge syntax improvement over writing delegates today in-line today. The extension methods provide a uniformity across the various "LINQ to" APIs that make moving from one type of data source to another very easy. I was pleased that it was so easy to see the SQL output of LINQ to SQL, and I pleasantly suprised that code generated by the designer for LINQ to Objects was so readable and well organized.

I will update this post with more comments later.

IFormattable

Monday, May 5, 2008

Using LINQ from .NET 2.0

Saturday, January 12, 2008

List of Links for Learning LINQ

Friday, December 21, 2007

Liking LINQ: A Question of Efficacy

Sunday, December 16, 2007

Liking LINQ: The Learning Curve

Monday, November 12, 2007

On Liking LINQ

First Impressions

Tags

Previous Articles

Who is IFormattable?

Who's Reading IFormattable?

Subscribe via email