Tuesday, December 29, 2009

Inchoate Thoughts on New Programming Abstractions for the Cloud

I watched an InfoQ video of Barbara Liskov at OOPSLA this evening.  At the end of her talk in a look at the challenges looking forward she said, “I’ve been waiting for thirty years for a new abstraction mechanism,” and, “There’s a funny disconnect in how we write distributed programs.  You write your individual modules, but then when you want to connect them together you’re out of the programming language, and sort of into this other world.  Maybe we need languages that are a little more complete now so that we can write the whole thing in the language.”  Her talk was an interesting journey through the history of Computer Science through the lens of her work, but this notion connected with something I’ve been thinking about lately.

Earlier this month I was catching up on the happenings on Channel9 and stumbled on this conversation between Erik Meijer and Dave Thomas.  I would have preferred a different interview-style, but Dave Thomas’ opinions are always interesting.  “I find it hard to believe that people would build a business application in Java or C#. […] The real opportunity is recognizing that there is really no need for people to do all of the stuff they have to do today.  And, I really believe that is fundamentally a language problem, whether that’s through an extended SQL, or its a new functional language, whether its a vector functional language; I can see lots of possibilities there. […] I see a future where there are objects in the modeling layer, but not in the execution infrastructure.”

I see a connection between Liskov’s desire “new abstraction mechanism” and Thomas’ “language problem”.  If you look at the history of “execution infrastructure”, there has been an unremitting trend toward greater and greater abstraction of the hardware that makes it all happen.  From twiddling bits, to manipulating registers, to compiled languages, to compiled to virtual machine bytecode and attendant technologies like garbage collection, interpreters, to very declarative constructs, we continually move away from the hardware to focus on the actual problems that we began writing the program to solve in the first place(1).  Both Liskov and Thomas are bothered by the expressivity of languages; that is, programming languages are still burdened with the legacy of the material world.

I think this may very well be the point of “Oslo” and the “M” meta-language.  One might view that effort as a concession that even DSLs are too far burdened by their host language’s legacy to effectively move past the machine programming era into the solution modeling era.  So, rather than create a language with which to write solution code, create a language/runtime to write such languages.  This is really just the logical next step, isn’t it?

I didn’t quite understand this viewpoint at the time, but I had this discussion to some extent with Shy Cohen at TechEd09.  I just didn’t grasp the gestalt of “Oslo”.  This certainly wasn’t a failing of his explanatory powers nor—hopefully—my ability to understand.  Rather I think it’s because they keep demoing it as one more data access technology.  The main demo being to develop an M grammar of some simple type and have a database created.  Again, the legacy of computing strikes again.  To make the demo relevant, they have to show you how it works with all your existing infrastructure.

So, maybe the solution to developing new abstractions is to once and for all abstract away the rest of the infrastructure.  Both Liskov and Thomas were making this point.  Why should we care how and where the data is stored?  Why should most developers care about concurrency?  Why should we have to understand the behemoth that is WCF in order to get to modules/systems/components talking to each other?

Let’s start over with an abstraction of the entirety of computing infrastructure(2): servers, networking, disks, backups, databases, etc..  We’ll call it, oh, I don’t know… a cloud.  Now, what properties must a cloud exhibit in order to ensure this abstraction doesn’t leak?  I suggest we can examine why existing abstractions in computing have been successful, understand the problems they solve at a fundamental level, and then ensure that our cloud can do the same.  Some candidates for this examination are: relational databases, virtual machines/garbage collectors, REST, Ruby on Rails, and LINQ.  I’m sure there are countless other effective abstractions that could serve as guideposts.

Should we not be able to express things naturally in a solution modeling language?  Why can’t I write, “When Company X sends us a shipment notification for the next Widget shipment, make sure that all Whatzit production is expedited. And, let me know immediately if Whatzit production capacity is insufficient to meet our production goals for the month.”  Sure, this is a bit imperative, but it’s imperatively modeling a solution, not instructions on how to push bits around to satisfy the solution model; in that sense it is very, very declarative.  I believe the cloud abstraction enables this kind of solution modeling language.

How about a cloud as the ambient monad for all programs?  What about a Google Wave bot as a sort of REPL loop for this solution modeling language?  What about extensive use of the Maybe monad or amb style evaluation in a constraint driven lexicon so that imperative rules like those in the sample above are evaluated in a reactive rather than deterministic fashion?

  1. Quantum computing productively remains undecided as to “how” it actually works.  That is, “how” is somewhat dependent on your interpretation of quantum mechanics.  Maybe an infinite number of calculation occur simultaneously in infinite universes with the results summed over via quantum mechanical effects.
  2. I should say here that I think the Salesforce.com model is pretty far down this path, as is Windows Azure to a lesser extent.  Crucially, neither of these platforms mitigate the overhead of persistence and communications constructs.

Monday, December 21, 2009

System.Tuple: More .NET 4.0 Functional Goodness

In functional languages, tuples are pretty much a requirement.  Of the functional languages I’ve used, including Erlang and T-SQL, tuples are bread’n’butter.  Every release of .NET has included more and more support for functional programming paradigms, and we’re getting tuple support in the BCL in .NET 4.0, though first-class language support is still not there.  Here’s an example of using System.Tuple with the new Zip LINQ operator.

Code Snippet
  1. var xs = new List<int> { 1, 3, 5 };
  2. var ys = new List<string> { "hello", "world", "channel" };
  3. var zs = xs.Zip(ys, Tuple.Create);
  4.  
  5. foreach (var z in zs)
  6.     Console.WriteLine("{1}:{0}",z.Item1, z.Item2);

Thursday, December 17, 2009

Quicksort Random Integers with PLINQ

The first “homework assignment” of Dr. Erik Meijer’s functional programming course on Channel 9 was to write quicksort using C# (or VB) functional comprehensions.  I did it in a very naïve way, then I wondered if I could get a “free” speed up simply by leveraging some of the Parallel Extensions to .NET 4.0. 
 
My results were mixed.  As the number of integers in the unsorted array increases, the parallel implementation begins to pull away, but with smaller numbers the sequential implementation is faster.  I think I’ll try to do something very similar in Erlang this weekend.  It should be an interesting comparison exercise since Erlang terms are immutable.  A naïve implementation should look pretty similar I think.  Neither of them would be true quicksort implementations, since the algorithm was specified by Hoare to work in a mutable fashion with the original array being changed in situ.
 
My dual core machine yielded times of about 80s and 100s when sorting an array of a thousand for the parallel (QsortP) and sequential (Qsort) methods respectively.  So that’s definitely not impressive, but I did it in a very abstract way.  The same function could sort strings or floats or anything else that implements IComparable. 

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Threading.Tasks;
  5. using System.Diagnostics;
  6.  
  7. namespace ConsoleApplication3
  8. {
  9.     class Program
  10.     {
  11.         static void Main(string[] args)
  12.         {
  13.             int[] ints = new int[100];
  14.             var rand = new Random();
  15.             for (int i = 0; i < ints.Length; i++)
  16.                 ints[i] = rand.Next(0, Int32.MaxValue);
  17.             int[] sortedP;
  18.             int[] sorted;
  19.             Time(() => sortedP = QsortP(ints).ToArray());
  20.             Time(() => sorted = Qsort(ints).ToArray());
  21.             Debugger.Break();
  22.         }
  23.  
  24.         public static IEnumerable<T> QsortP<T>(IEnumerable<T> arr) where T : IComparable
  25.         {
  26.             if (arr.Count() == 0)
  27.                 return arr;
  28.             T pivot = arr.First();
  29.             var sortLess = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) < 0)));
  30.             var sortMore = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) > 0)));
  31.             sortLess.Start(); sortMore.Start();
  32.             var equal = arr.AsParallel().Where(v => v.CompareTo(pivot) == 0);
  33.             return sortLess.Result.Concat(equal).Concat(sortMore.Result);
  34.         }
  35.  
  36.         public static IEnumerable<T> Qsort<T>(IEnumerable<T> arr) where T : IComparable
  37.         {
  38.             if (arr.Count() == 0)
  39.                 return arr;
  40.             T pivot = arr.First();
  41.             return Qsort(arr.Where(v => v.CompareTo(pivot) < 0))
  42.                 .Concat(arr.Where(v => v.CompareTo(pivot) == 0))
  43.                 .Concat(Qsort(arr.Where(v => v.CompareTo(pivot) > 0)));
  44.         }
  45.  
  46.         public static void Time(Action a)
  47.         {
  48.             var t = new Stopwatch();
  49.             t.Start();
  50.             a();
  51.             t.Stop();
  52.             Debug.Write(t.Elapsed.TotalSeconds);
  53.         }
  54.     }
  55. }

Tuesday, December 15, 2009

Premature Optimization and LINQ to SQL

Don’t count your chickens before they hatch.  Let’s call that adage the “count-no” principle.  I promise that will be funny later.

The common interpretation of this adage is that you should not rely on future results when making decisions in the present.  It is generally an admonishment to not be too cocky about the future.  What might the corollary be?  Perhaps to concern ourselves with the details of the present moment rather than thinking about the future.  We’ll call this corollary “nocount”.  One specific example of this corollary is the programmer’s maxim about premature optimization.  In fact, it was Donald Knuth who said, “"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil,” in a response to Dijkstra’s “Go to statement considered harmful”.

So, I’m debugging a particularly pernicious problem today.  In production (and only production) we were getting a ChangeConflictException when updating a row in the database.  Unfortunately, the logs indicated that there were no MemberChangeConflict objects.  In other words we weren’t violating optimistic concurrency.  So, I profiled the SQL statements and it appeared to be just what I expected, a simple update supporting optimistic concurrency.

Did you know you can eek out an exceedingly miniscule amount of performance by using the “SET NOCOUNT ON” option with SQL Server?  Did you know that you can actually set this option at the server level? Do you know what happens when your LINQ to SQL DataContext uses the row count returned from your optimistically concurrent update statement to determine if one and only one row was updated?

Yes dear reader, premature optimization can cause the most insidious and difficult to discover problems.  Please, don’t ever violate the “nocount princple”!