Monday, June 21, 2010

iPad: The InterPersonal Computer

There’s no shortage of information on the “how” of the iPad.  Apple’s reification of Alan Kay’s Dynabook makes no sacrifices in terms of processing, communications, display—even the audio is surprisingly good.  But what does the A4 system-on-a-chip, IPS display, Wifi/Bluetooth/3G add up to in terms of experience?

Having spent a week with the iPad, I feel compelled to write down my answers to that question. The iPad is nothing short of a joy in my home.  It’s the device we didn’t know we needed: the fourth screen.  It’s the morning paper, the evening magazine, and the after dinner board game.  It’s the vacation photo album, the argument settler, and the cookbook.  The iPad is the first interpersonal computer (iPC); the PC has artfully been disguised as an intelligent, portable screen that facilitates rather than stymies interpersonal interaction.

Why did we need this device? Surely I could use Wikipanion on my phone to settle the debate on the national language of Côte d’Ivoire during the game. But, I couldn’t show you the map of the region from across the room.  I definitely could have turned on my PC and connected my TV via DLNA to show our vacation photos. But I’d rather just hand you the album to scan at your leisure.  We could get all the tiles out, flip them over, mix them up, and arrange them, but board games are much more fun (and more apt to be played) when you don’t have to set them up or put them away. I could have done an internet search for recipes that included lemon balm and printed one out, but it’s nice to just go straight from searching to cooking. If we had kids the raison d'être of this latter-day Dynabook would be handsomely fulfilled by an interactive periodic table, a sketchbook, musical toys, and a huge library of books.  For now we’ll just have to settle for loving these apps as grown-ups.

This is a device for kids of all ages, to be sure.  Each app acts as a mask, transforming the iPad to a device well-suited to the task at hand. Who wouldn’t find something to enjoy?

I’d be remiss if I didn’t offer some lament or prognostication on future enhancements.  This device would be truly magical if I didn’t have to plug it in and synch with iTunes.  If my photos, videos, and music were just wirelessly transported from the cloud on demand and cached locally, I wouldn’t have to wait forever to synch or chew up a bunch of space with things I rarely want. 

I would love it if the apps knew me. Maybe a fingerprint scanner could be added to help applications identify me; when I launch a game or Twitter client, if I swiped my finger it could load my saved game or timeline.  An iPC should be like a family friend, a unique relationship with each of us, but impartial and accessible to all.

AT&T+Apple could score quite the coup if the 3G was free up to a certain level of usage.  How many snowbirds would buy an iPad to stay in touch with the family back home?  How many more  business travelers would be able to keep up with their inbox without having to be nickel-and-dimed all the time?  More importantly, they would have a device that would be complete out-of-the-box; thank you for purchasing this magical screen that is connected to everyone, everywhere, anywhere you are—right now.

Monday, May 17, 2010

The Law of Demeter and Command-Query Separation

My first encounter of the Law of Demeter (LoD) was in the Meilir Page-Jones book, Fundamentals of Object-Oriented Design in UML. It is also referenced in Clean Code by Robert C. Martin.  Basically, the law states that methods on an object should only invoke methods on objects in their immediate context: locally created objects, containing instance members, and arguments to the method itself.  This limits what Page-Jones calls the direct encumbrance of a class;  the total set of types a class directly depends upon to do its work.  Martin points out that if an object effectively encapsulates it’s internal state, we should not be able “navigate through it.”  Not to put too fine of a point on it, but the kind of code we are talking about here is:

Code Snippet
  1. static void Main(string[] args)
  2. {
  3.     var a = new A();
  4.     a.Foo().Bar();
  5. }
  6.  
  7. class A
  8. {
  9.     public B Foo()
  10.     {
  11.         // do some work and yield B
  12.     }
  13. }
  14.  
  15. class B
  16. {
  17.     public void Bar()
  18.     {
  19.  
  20.     }
  21. }

Martin calls line 4 above a “train wreck” due to its resemblance to a series of train cars.  Our Main program has a direct encumbrance of types A & B. We “navigate through” A and invoke a method of B.  Whatever Foo() does, it is not effectively encapsulating it; we cannot change it to an implementation that uses C transparently.

LoD is a heuristic that leverages the encumbrance of a type to determine code quality. We observe that effective encapsulation directly constrains encumbrance, so we can say that the Law of Demeter is a partial corollary to an already well known OOP principle: encapsulation.  Another such principle is Command-Query Separation (CQS) as identified by Bertrand Meijer in his work on the Eiffel programming language.

CQS simply states that methods should either be commands or query; they should either mutate state or return state without side-effects.  Queries must be referentially transparent; that is you can replace all query call sites with the value returned by the query without changing the meaning of the program. Commands must perform some action but not yield a value.  Martin illustrates this principle quite succinctly in Clean Code.

Referring to our snippet above again, we can see that if CQS were to have been observed, Foo() would return void and our main program would not have been returned an instance of B.  CQS thus reinforces LoD, both of which manifest as specializations of OO encapsulation.  Following these principles force us to change the semantics of our interfaces, creating contracts that are much more declarative.

CQS has many implications.  Fowler observes that CQS allows consumers of classes to utilize query methods with a sense of “confidence, introducing them anywhere, changing their order.”  In other words, CQS allows for arbitrary composition of queries; this is a very important concept in functional programming.  Queries in CQS are also necessarily idempotent, by definition; this is extremely important in caching.

In Martin’s discussion of LoD, he notes that if B above were simply a data structure—if all of its operations were Queries—we would not have a violation of LoD, in principle.  This is because LoD is really only concerned with the proper abstraction of Commands. From the C2 wiki,

In the discussion on the LawOfDemeter, MichaelFeathers offers the analogy, "If you want your dog to run, do you talk [to] your dog or to each leg? Further, should you be able to manipulate the dog's leg without it knowing about it? What if your dog wants to move its leg and it doesn't know how you left it? You can really confuse your dog."

To extend the analogy, when you are walking your dog, you don’t command it’s legs, you command the dog.  But, if you want to have a smooth walk, you’ll stop and wait if you one of the dog’s legs is raised. It is in this sense that we can restate the Law of Demeter in terms of Command-Query Separation.

  • Whereas a Query of an object must:
    1. never alter the observable state of the object, i.e. the results of any queries;
    2. and return only objects entirely composed of queries, no commands.
  • A Command of an object must:
    1. constrain its actions to be dependent upon the observable state, i.e. Queries, of only those objects in its immediate context (as defined above);
    2. and hide the internal state changes that result of its actions from observers.

This restating is helpful in that it implies that we refactor to CQS before applying LoD. In the first phase we clearly separate commands from queries. In the second phase we alter the semantics of our commands and queries to comply with LoD.  While the first phase is rather mechanical, it give us a good starting point to reconsider the semantics of our objects as we bring them into compliance with LoD.

Thursday, April 22, 2010

Usability Apothegms

A common saying in computing is that “Security is inversely proportional to usability”… or something like that.  As we critical examine the security of our systems, we realize we need to put measures in place that make the system harder to access and thus harder to use.  A good interaction design can help mitigate the usability issues, but at the end of the day a system that doesn’t require me to memorize a password or login is easier to use than a system that does.

We can say definitively that security and usability exist in tension.

As software architects we seek simplicity in our designs in the name of maintainability, if not intelligibility.  We also seek modularity in the name of reusability. I submit that simplicity and modularity exist in tension.

Accepting a priori that simplicity is the absence of complexity, we can obtain the simplicity of a program by measuring its complexity.  A field of computer science called algorithmic information theory defines the complexity of something to be the length of the simplest program for calculating it.  We might infer from this that a monolithic program (no components, no objects, no abstractions, etc.) is a simpler program than our common object-oriented code.  In general we can say that modularity implies no small increase in the use of abstractions to enable that modularity.

In object-oriented* systems, an increase in modularity results in a proportional increase in complexity.

I limit this to object-oriented systems purposefully.  In my experience functional programming languages modularity is de rigueur.

Monday, April 12, 2010

Converting Oranges to Apple’s: Meta-competition in the Platform Wars

Updates to the developer agreement in the new iPhone SDK restrict developers to using C, C++, or Objective-C to create their native iPhone applications.  That is, they must have been “originally written” in one of those languages.

This move has made its rounds among the pundits and bloggers.  Rather than rehashing any of those points, I’ll give my own opinions on why Apple made this move.

Make no mistake, this is about stewardship of the iPhone experience.  Steve Jobs has responded to the criticism (emphasis my own),

[…] intermediate layers between the platform and the developer ultimately produces sub-standard apps and hinders the progress of the platform.

Before you go say they’re really just trying to kill Flash, let’s address that up front.  Flash does not run in Safari on the iPhone for the same reason that you are prohibited in the developer agreement from hosting a virtual machine in your iPhone application.  As the steward of the platform, Apple has to attempt to ensure that users have consistently good experience.  Trying to squeeze a bloated swf player into a memory and processor constrained device is just not workable currently.  As any iPhone user can attest, websites that are not optimized for mobile browsers are horribly slow to load.  Flash content would just make that worse.  And, guess what, my mom doesn’t know that your website or Flash animation is a pig, she just thinks her phone is slow.

Let’s look at it from another perspective.  The latest incarnation of Microsoft’s flagship development suite, Visual Studio 2010, has been re-written on their WPF platform.  Well, everything except the splash screen.  You see, it takes too long to load WPF and .NET up to affect the desired result of a splash screen—giving you something to look at while the main application loads.  The iPhone developer guide makes it abundantly clear that you should load something as quickly as possible to give the user the sense that your app is responsive.  Try explaining to your mom that, yes, her phone has registered her tap on the application icon, but that it has to spin up a virtual machine or load some large libraries that abstract away Cocoa.

Next, try to explain to her why the extra cycles required by the VM and/or interoperability libraries drain her battery.  She’ll just be annoyed that she’s constantly plugging the darn thing in.

Apologies to all the technical moms out there.

so, Flash [1] is not the target here, at least in the sense that Apple cares if Flash is around or perceives it as a threat.  And, assuredly, we cannot say that stewardship of the performance of the phones is the only reason.  That’s part of Steve’s “sub-standard apps” argument, and it is not entirely convincing given that it becomes an optimization problem, and developers are notoriously good at those when they set their mind to it.  As a steward though, you really cannot afford to wait from them to figure it out.  Nevertheless, I think the original ban on virtual machines and this subsequent tightening to a whitelist of languages comes down to a stewardship of a different kind: keeping out the riff-raff.

Keeping up with AppStore submissions is already pretty hard and very expensive.  Developers are often frustrated by the time it takes to get their updates into the store, especially developers used to patching websites with no downtime. Don’t imagine for a second that your $99 covers the cost of the program.  This is Apple’s investment in the ecosystem.  Imagine the deluge of useless, crappy submissions they will get if every Flash developer and every VB developer can just tick a box to target the iPhone.

But, it’s not just about raising the bar for entry to keep out sub-standard developers and their apps.  They have to look after all those developers who have committed to their platform.  They aren’t going to invest significant resources to enabling C#, VB, Python, Ruby, or Flash developers to compete against their own. 

Apple is not interested in seeing your app run on a Droid, Windows Mobile 7, and iPhone.  They are actively controverting homogenization of the mobile application marketplace.  They have an insurmountable lead[2] in the mobile applications marketplace, and the iPad, another device on the iPhone OS platform, marks the next step in their overall competitive strategy for leveraging that lead.  At the center of all of this is the AppStore, perhaps the single most valuable asset in computing today.  Whereas no one controls the Internet, Apple owns the AppStore outright with all the rights and responsibilities that entails; think of Salesforce.com and their AppExchange.

Microsoft has long understood that developers are what make their ecosystem work.  Steve Ballmer’s famous rallying cry, “Developers! Developers! Developers!”, is poignant illustration of this reality.  Apple understands that to usurp Microsoft’s position in the PC and enterprise markets, to continue their domination of consumer segments, they need a large population of developers that understand their platform.  What they don’t need are more ways for developers to “write once, run everywhere.”  In the platform wars, he who has the developers wins.

 

[1] Anyway, it isn’t just about Flash.  Look no further than Microsoft’s Windows Mobile 7 announcements to see that.  Silverlight, a Flash competitor to say that least, will be the native platform of those devices.  Where Flash is Silverlight will surely follow.

[2] I believe that Google will continue to have limited success as a web-native device.  There is a significant segment of the marketplace that just want a phone, email, web-browsing device.  RIM has an Apple-style loyalty thing going for it.  Microsoft?  Well, they still have such a big footprint; they could fail and still make a huge impact.  I think the move to Silverlight helps shield their effort from their internal Windows-Office power structure enough that it actually has a chance of competing, but they are way too far behind to win.  There’s not a huge population of Silverlight developers in the world, after all.

Tuesday, December 29, 2009

Inchoate Thoughts on New Programming Abstractions for the Cloud

I watched an InfoQ video of Barbara Liskov at OOPSLA this evening.  At the end of her talk in a look at the challenges looking forward she said, “I’ve been waiting for thirty years for a new abstraction mechanism,” and, “There’s a funny disconnect in how we write distributed programs.  You write your individual modules, but then when you want to connect them together you’re out of the programming language, and sort of into this other world.  Maybe we need languages that are a little more complete now so that we can write the whole thing in the language.”  Her talk was an interesting journey through the history of Computer Science through the lens of her work, but this notion connected with something I’ve been thinking about lately.

Earlier this month I was catching up on the happenings on Channel9 and stumbled on this conversation between Erik Meijer and Dave Thomas.  I would have preferred a different interview-style, but Dave Thomas’ opinions are always interesting.  “I find it hard to believe that people would build a business application in Java or C#. […] The real opportunity is recognizing that there is really no need for people to do all of the stuff they have to do today.  And, I really believe that is fundamentally a language problem, whether that’s through an extended SQL, or its a new functional language, whether its a vector functional language; I can see lots of possibilities there. […] I see a future where there are objects in the modeling layer, but not in the execution infrastructure.”

I see a connection between Liskov’s desire “new abstraction mechanism” and Thomas’ “language problem”.  If you look at the history of “execution infrastructure”, there has been an unremitting trend toward greater and greater abstraction of the hardware that makes it all happen.  From twiddling bits, to manipulating registers, to compiled languages, to compiled to virtual machine bytecode and attendant technologies like garbage collection, interpreters, to very declarative constructs, we continually move away from the hardware to focus on the actual problems that we began writing the program to solve in the first place(1).  Both Liskov and Thomas are bothered by the expressivity of languages; that is, programming languages are still burdened with the legacy of the material world.

I think this may very well be the point of “Oslo” and the “M” meta-language.  One might view that effort as a concession that even DSLs are too far burdened by their host language’s legacy to effectively move past the machine programming era into the solution modeling era.  So, rather than create a language with which to write solution code, create a language/runtime to write such languages.  This is really just the logical next step, isn’t it?

I didn’t quite understand this viewpoint at the time, but I had this discussion to some extent with Shy Cohen at TechEd09.  I just didn’t grasp the gestalt of “Oslo”.  This certainly wasn’t a failing of his explanatory powers nor—hopefully—my ability to understand.  Rather I think it’s because they keep demoing it as one more data access technology.  The main demo being to develop an M grammar of some simple type and have a database created.  Again, the legacy of computing strikes again.  To make the demo relevant, they have to show you how it works with all your existing infrastructure.

So, maybe the solution to developing new abstractions is to once and for all abstract away the rest of the infrastructure.  Both Liskov and Thomas were making this point.  Why should we care how and where the data is stored?  Why should most developers care about concurrency?  Why should we have to understand the behemoth that is WCF in order to get to modules/systems/components talking to each other?

Let’s start over with an abstraction of the entirety of computing infrastructure(2): servers, networking, disks, backups, databases, etc..  We’ll call it, oh, I don’t know… a cloud.  Now, what properties must a cloud exhibit in order to ensure this abstraction doesn’t leak?  I suggest we can examine why existing abstractions in computing have been successful, understand the problems they solve at a fundamental level, and then ensure that our cloud can do the same.  Some candidates for this examination are: relational databases, virtual machines/garbage collectors, REST, Ruby on Rails, and LINQ.  I’m sure there are countless other effective abstractions that could serve as guideposts.

Should we not be able to express things naturally in a solution modeling language?  Why can’t I write, “When Company X sends us a shipment notification for the next Widget shipment, make sure that all Whatzit production is expedited. And, let me know immediately if Whatzit production capacity is insufficient to meet our production goals for the month.”  Sure, this is a bit imperative, but it’s imperatively modeling a solution, not instructions on how to push bits around to satisfy the solution model; in that sense it is very, very declarative.  I believe the cloud abstraction enables this kind of solution modeling language.

How about a cloud as the ambient monad for all programs?  What about a Google Wave bot as a sort of REPL loop for this solution modeling language?  What about extensive use of the Maybe monad or amb style evaluation in a constraint driven lexicon so that imperative rules like those in the sample above are evaluated in a reactive rather than deterministic fashion?

  1. Quantum computing productively remains undecided as to “how” it actually works.  That is, “how” is somewhat dependent on your interpretation of quantum mechanics.  Maybe an infinite number of calculation occur simultaneously in infinite universes with the results summed over via quantum mechanical effects.
  2. I should say here that I think the Salesforce.com model is pretty far down this path, as is Windows Azure to a lesser extent.  Crucially, neither of these platforms mitigate the overhead of persistence and communications constructs.

Monday, December 21, 2009

System.Tuple: More .NET 4.0 Functional Goodness

In functional languages, tuples are pretty much a requirement.  Of the functional languages I’ve used, including Erlang and T-SQL, tuples are bread’n’butter.  Every release of .NET has included more and more support for functional programming paradigms, and we’re getting tuple support in the BCL in .NET 4.0, though first-class language support is still not there.  Here’s an example of using System.Tuple with the new Zip LINQ operator.

Code Snippet
  1. var xs = new List<int> { 1, 3, 5 };
  2. var ys = new List<string> { "hello", "world", "channel" };
  3. var zs = xs.Zip(ys, Tuple.Create);
  4.  
  5. foreach (var z in zs)
  6.     Console.WriteLine("{1}:{0}",z.Item1, z.Item2);

Thursday, December 17, 2009

Quicksort Random Integers with PLINQ

The first “homework assignment” of Dr. Erik Meijer’s functional programming course on Channel 9 was to write quicksort using C# (or VB) functional comprehensions.  I did it in a very naïve way, then I wondered if I could get a “free” speed up simply by leveraging some of the Parallel Extensions to .NET 4.0. 
 
My results were mixed.  As the number of integers in the unsorted array increases, the parallel implementation begins to pull away, but with smaller numbers the sequential implementation is faster.  I think I’ll try to do something very similar in Erlang this weekend.  It should be an interesting comparison exercise since Erlang terms are immutable.  A naïve implementation should look pretty similar I think.  Neither of them would be true quicksort implementations, since the algorithm was specified by Hoare to work in a mutable fashion with the original array being changed in situ.
 
My dual core machine yielded times of about 80s and 100s when sorting an array of a thousand for the parallel (QsortP) and sequential (Qsort) methods respectively.  So that’s definitely not impressive, but I did it in a very abstract way.  The same function could sort strings or floats or anything else that implements IComparable. 

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Threading.Tasks;
  5. using System.Diagnostics;
  6.  
  7. namespace ConsoleApplication3
  8. {
  9.     class Program
  10.     {
  11.         static void Main(string[] args)
  12.         {
  13.             int[] ints = new int[100];
  14.             var rand = new Random();
  15.             for (int i = 0; i < ints.Length; i++)
  16.                 ints[i] = rand.Next(0, Int32.MaxValue);
  17.             int[] sortedP;
  18.             int[] sorted;
  19.             Time(() => sortedP = QsortP(ints).ToArray());
  20.             Time(() => sorted = Qsort(ints).ToArray());
  21.             Debugger.Break();
  22.         }
  23.  
  24.         public static IEnumerable<T> QsortP<T>(IEnumerable<T> arr) where T : IComparable
  25.         {
  26.             if (arr.Count() == 0)
  27.                 return arr;
  28.             T pivot = arr.First();
  29.             var sortLess = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) < 0)));
  30.             var sortMore = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) > 0)));
  31.             sortLess.Start(); sortMore.Start();
  32.             var equal = arr.AsParallel().Where(v => v.CompareTo(pivot) == 0);
  33.             return sortLess.Result.Concat(equal).Concat(sortMore.Result);
  34.         }
  35.  
  36.         public static IEnumerable<T> Qsort<T>(IEnumerable<T> arr) where T : IComparable
  37.         {
  38.             if (arr.Count() == 0)
  39.                 return arr;
  40.             T pivot = arr.First();
  41.             return Qsort(arr.Where(v => v.CompareTo(pivot) < 0))
  42.                 .Concat(arr.Where(v => v.CompareTo(pivot) == 0))
  43.                 .Concat(Qsort(arr.Where(v => v.CompareTo(pivot) > 0)));
  44.         }
  45.  
  46.         public static void Time(Action a)
  47.         {
  48.             var t = new Stopwatch();
  49.             t.Start();
  50.             a();
  51.             t.Stop();
  52.             Debug.Write(t.Elapsed.TotalSeconds);
  53.         }
  54.     }
  55. }

Tuesday, December 15, 2009

Premature Optimization and LINQ to SQL

Don’t count your chickens before they hatch.  Let’s call that adage the “count-no” principle.  I promise that will be funny later.

The common interpretation of this adage is that you should not rely on future results when making decisions in the present.  It is generally an admonishment to not be too cocky about the future.  What might the corollary be?  Perhaps to concern ourselves with the details of the present moment rather than thinking about the future.  We’ll call this corollary “nocount”.  One specific example of this corollary is the programmer’s maxim about premature optimization.  In fact, it was Donald Knuth who said, “"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil,” in a response to Dijkstra’s “Go to statement considered harmful”.

So, I’m debugging a particularly pernicious problem today.  In production (and only production) we were getting a ChangeConflictException when updating a row in the database.  Unfortunately, the logs indicated that there were no MemberChangeConflict objects.  In other words we weren’t violating optimistic concurrency.  So, I profiled the SQL statements and it appeared to be just what I expected, a simple update supporting optimistic concurrency.

Did you know you can eek out an exceedingly miniscule amount of performance by using the “SET NOCOUNT ON” option with SQL Server?  Did you know that you can actually set this option at the server level? Do you know what happens when your LINQ to SQL DataContext uses the row count returned from your optimistically concurrent update statement to determine if one and only one row was updated?

Yes dear reader, premature optimization can cause the most insidious and difficult to discover problems.  Please, don’t ever violate the “nocount princple”!

Tuesday, November 10, 2009

Create an XML Schema Enumeration of Supported .NET CultureInfo with Powershell

Here’s a Powershell script to generate the enumeration elements of an XML Schema simpleType restriction.  The first command loads the .NET Assembly you’ll need, sysglobl.dll.

PS> [System.Reflection.Assembly]::Load("sysglobl, Version=2.0.0.0, Culture=neutral, PublicKeyToken= b03f5f7f11d50a3a, processorArchitecture=MSIL")
PS> [System.Globalization.CultureInfo]::GetCultures([System.Globalization.CultureTypes]::FrameworkCultures)
| ? {$_.Name.Trim().Length -eq 5} | % { "<xs:enumeration value=`"$_`"/>"} | Out-File cultures.txt

See my earlier post on locale identifiers for more background.

Friday, October 30, 2009

Whoa!!! HP Software Sucks!

image

This is the error that was reported after the HP installer forced me to restart my computer to continue installation…

Thursday, October 15, 2009

TFS: Locked for Check-out

Today a co-worker was attempting to edit a file, and TFS reported that it was “locked for check-out by [me] in workspace …”.

I didn’t have it checked out.  I tried to use “tf lock /lock:none” on the file in question, but it would simply report:

TF10152: The item … must remain locked because its file type prevents multiple check-outs.

We don’t allow multiple checkouts for this Team Project.  Unfortunately, this meant that not only didn’t I intend to lock the file, I couldn’t unlock it.

My solution was to just check-in some dummy changes, comments full of expletives if you must know, since TFS will not perform a check-in unless there are actually changes to the files.

So yeah, not TFS’ most shining moment.

Thursday, August 6, 2009

Cloud Morphology

Spent some time this evening tracking down the various component technologies of the myriad cloud offerings out there.  Here’s the result.  This will be updated over time.

Tuesday, August 4, 2009

NASA’s Cloud Platform: NEBULA, and Amorphous Morphology

NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies.

They always write it in all caps, though it doesn’t appear to be an acronym. My guess is that it’s easier to search for “NEBULA” and find what you want at NASA. And, what better name for a cloud computing platform?

To be frank, I didn’t recognize any of the technologies other than RabbitMQ and Apache. So I set out to find out what each piece does, and it’s these times that really make me envious of the LAMP folks. There is so much going on in the OSS world!

In fact, it’s that sheer volume of innovation that makes it such a nightmare. Here’s where having NASA or Amazon or Elastra or EngineYard or Rackspace or Google AppEngine discuss their respective cloud infrastructures is so beneficial. If you look at their technology choices as a Venn diagram, you might find an optimal set of technologies in the overlap. Beyond that, you can begin to form a generalized blueprint of a cloud platform, identifying the core components.

cloud provider Venn diagram

In this way the cloud begins to take shape. Were we to sit all the big industry players down and say, define the cloud al a the OSI model, no one would have gotten around to building one. If we find the clouds in the wild, we can dissect them and find out what makes them tick. We can reverse engineer our blueprint.

There’s a few major players missing here, of course, most notably Salesforce.com (SFDC) and Microsoft’s Azure platform.  SFDC doesn’t have an infrastructure that you can replicate, per se, because it is inseparable from their proprietary technologies.  In fact, Stu Charlton and others have noted that this is their biggest failing.  Similarly, Microsoft’s Azure is also a monolithic platform play, with the major difference that the platform can be deployed on premises, a so-called private cloud.

Still though, we can include SFDC and Azure in our analysis.  We can develop our morphology of the cloud platforms that have published their constituent technologies.  With the language of this morphology in hand, we can classify the facilities of other cloud platforms.  Perhaps a taxonomy will evolve that allows us to identify Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service, hybrids, and perhaps some novel species.

Despite the amorphous character implied by the term cloud computing, these platforms have well-defined structure.  Moreover, unlike the traditional use of clouds in architecture diagrams, the details of this structure are important.

Sunday, August 2, 2009

Powershell: Get Size of Child Directories

gci -recurse -force |
? { $_.GetType() -like 'System.IO.DirectoryInfo'} |
     select-object Name,@{Name="size"; Expression = {
    ($_.GetFiles()  |
    Measure-Object -Property Length -Sum |
     Measure-Object -Property Sum -Sum).Sum}} | sort-object -Property size

Wednesday, July 8, 2009

Three Common Fallacies Concerning REST

Better to light a candle than to curse the darkness. -Chinese Proverb

I purposely did not title this post “The 3 Fallacies of RESTful Computing” as I am certainly not an expert in either REST or fallacy. :)  I am, however, quite well-versed in auto-didacticism, and over the past week I’ve been boning up on REST.  Along the way I’ve had one of my early notions of REST disabused (it is neither an architecture nor a protocol) and noticed a few other common misconceptions in the blogosphere and tweet stream.  If you are new to REST, or even if you aren’t, you might very well find a few edifying points in this post; I hope to light a candle or two out there.

Without further ado, here are three common fallacies concerning REST. 

Fallacy #1 REST is CRUD

Perhaps the most common fallacy of RESTful computing is that REST is simply CRUD (Create, Read, Update, and Delete) over HTTP.  Microsoft’s ADO.NET Data Services endeavors to provide developers a “data service being surfaced to the web as a REST-style resource collection[…]”; this would seem to further the notion that REST is another way of doing CRUD.

However, REST is not CRUD as Stu Charlton states in a blog post about REST design guidelines; Arnon Rotem-Gal-Oz says CRUD is bad for REST, too.  If we are going to attempt to abridge RESTful architecture with an innocuous statement of the form “REST is X over HTTP” let us say that REST is using URLs to facilitate application state changes over HTTP.

Fallacy #2 POST is not RESTful

First, it is very important to note that REST is not tied to a specific protocol.  As an architectural style, it is protocol agnostic; though to be sure HTTP is a natural fit for many reasons.  As Fielding said in It is okay to use POST:

Search my dissertation and you won’t find any mention of CRUD or POST. The only mention of PUT is in regard to HTTP’s lack of write-back caching.  The main reason for my lack of specificity is because the methods defined by HTTP are part of the Web’s architecture definition, not the REST architectural style.

Since the nominal use of POST is orthogonal to RESTfulness, by definition, it cannot be the case that POST is antithetical to REST.  Nevertheless, it is important to understand the reasoning that generally goes into this fallacy, because it speaks directly to a core principle of REST.  Most architectures expose an API that allows consumers to affect the state of the application only indirectly.  To understand the implications of your actions as a consumer, you then have—at best—to be very familiar with the application architecture and be aware that you are making a lot of assumptions.  The state mechanics are hidden from you.  You cannot explicitly move the application from one state to another, nor can you directly observe the transition(s) that have taken place. 

A primary goal of Representational State Transfer is to make the application’s state machine unambiguous by exposing representations of application resources that have embedded within them URIs pertinent to the resource.  In this way a consumer of a RESTful service can discover the current state of the resource, the mechanisms to affect change in the resource, and other resources related to the current resource in the application’s state.  This is what is known as Hypertext as the Engine of Application State (HatEoAS).

HatEoAS implies a model where all of your important resources are represented by unique URIs and all important state changes are done by interacting with representations sent to or retrieved from those URIs. Most people first approaching REST view it in terms of opposing other architectural “styles” such as SOA or get mired in implementation immediately and begin contrasting their understanding of REST over HTTP against WS-*. Another common problem is reducing the ethos of REST to “RPC is bad” and “we don’t need all that complexity, we have HTTP” (see REST is better than WS-*).  These views are commonplace because REST is being promulgated as a better solution for many types of applications on the Web.

The specifics of how REST works over HTTP are beyond the scope of this article, and the subject of a lot of debate, but, since a lot of uses of POST seem very much like RPC-style invocations, people have a knee-jerk reaction that POST is not RESTful.  By now you know this is not the case, but let’s hear from the experts.

From Fielding:

POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT) […]

Stu Charlton’s design guidelines say nearly the same thing:

The problem with POST is when we abuse it by having it perform things that are more expressive in one of the other methods. GET being the obvious one that needs no hypermedia description. For the other methods, a good design guideline is that you MUST not break the general contract of the HTTP method you choose -- but you SHOULD describe the specific intent of that method in hypermedia.

Fallacy #3 REST is better than WS-*

In fact, Fielding’s thesis does address many of the problems and advantages of various network-based architectural styles, but no where does he claim REST is the one ring to rule them all.  In his aforementioned blog post Fielding says (emphasis my own),

[…]there are plenty of information systems that can be designed using the REST architectural style and gain the associated benefits. Managing cloud instances is certainly one of those applications for which REST is a good fit[…]

From his thesis, here is the definition of the REST architectural style.

REST consists of a set of architectural constraints chosen for the properties they induce on candidate architectures. […] [It] is an abstraction of the architectural elements within a distributed hypermedia system. […] It encompasses the fundamental constraints upon components, connectors, and data that define the basis of the Web architecture, and thus the essence of its behavior as a network-based application.

So, the associated benefits are induced by constraints that collectively are referred to as the REST architectural style.  The benefits are myriad, and many of the constraints are are recognizable in how the web works. Section #5 of the thesis is concise and readable, and I encourage the reader to internalize it. 

The salient point here is that REST is not a protocol; it’s not even an architecture.  REST is an architectural style! By understanding its constraints and benefits, we can make informed decisions about its applicability to our problem domain and appropriation of technologies.  Comparing REST to the WS-* suite of protocols is comparing apples to oranges, though there are those who strongly argue the benefits of REST and HTTP over SOAP.

Saturday, June 27, 2009

What, Not How & Why, Not When

It occurred to me this morning that many software development principles seem to emerge from the rigorous application of the following principle:

Your architecture and code should make What & Why explicit without specifying How & When.
What, Not How

It is well known that we should prefer declarative over imperative code. By observing the Law of Demeter, we are in fact forced to create declarative, What, semantics in our interfaces since we can't get at the How, the imperative constructs. The Single-Responsibility Principle tends to force us to factor out the How into other classes, leaving us to consume other classes with What semantics. Further, the Interface Segregation Principle requires us to explicity segregate the interfaces our clients consume based on What they are trying to accomplish; if our focus was How, such segregation would be less of a concern.

Event-Driven Architectures (EDA) are another example of What, Not How. Whereas the SOLID principles operate at the class design level of abstraction, EDA is concerned with system-level architecture. In an Event-Drive Architecture, we explicitly model happenings in the domain. Rather than coupling the site of the happening to a specific party designating for dealing with that happening, we create an Event and use reliabile messaging and subscription schemes to allow one or many services to handle it. In other words, instead of specifying How to deal with a happening at the site that generates it, we explicitly model What happened and let other parties worry about How to deal with it.

Why, Not When

This maxim is both more subtle and more prosaic than What, Not How. It is probably pretty obvious that when given a requirement stated, "if the purchase order acknowledgement is not received within two hours, notify someone in fulfillment," we should model an Event "POAckIsLate" as opposed to "TwoHoursHaveElapsedWithoutReceivingPOAck". We will have different SLAs with different vendors; those SLAs will change, etc. So we can say, when modeling Events in our domain, we should prefer specifying Why, Not When.

Perhaps more subtle is the implications for communication semantics between modules. If we model our communications with Why in mind, we don't get mired in the concurrency problems of specifying When. Consider a workflow. If we specify When to go to a particular step, the underlying reason may have changed unless we take some sort of explicit lock on shared state. If we instead specify Why a particular state transition takes place, we can avoid inconsistent states through continuous evaluation. If we make Why explicit and consequently create semantics to evaluate Why independently of "current" state, it becomes possible to evaluate the state consistently without any shared state, i.e. without a notion of When.

As an example, if we had the requirement, "when a PO is received with a quantity for part X above a twenty units, move the order to the top of the work queue," we should model a "BulkProductionRequest" Event and an "ExpediteProduction"; we should not implement a "Reprioritize Production Queue For Order of PartX Over Twenty Units". Begin with the end in mind and ask What do we want to do (expedite production) not How (re-prioritize production queue). Ask Why are we expediting this order? Because it is Bulk. What is Bulk? Bulk is a quality determined by a CapacityPlanning service and implies that the quantity exceeds some production capacity threshold.

Friday, June 26, 2009

Pareto, Zipf, Heap: The 80-20 Rule, Language, and Diminishing Returns

Please consider this a sort of layman’s disambiguation page.

The classic “80-20 rule” refers to a Pareto distribution.  The thin distribution of the 20% is the subject of Chris Anderson’s “The Long Tail”.  Originally, the Pareto distribution referred to the fact that 20% of the people control 80% of the wealth, but it has turned up in many other contexts.

A Zipf’s Law is about rankings and frequency.  The second item’s frequency will be half of the first; the third’s frequency will be half the second, and so on.  The “half” may be some other factor, but it remains constant in the distribution; the rank is inversely proportional to the frequency. The item in Zipf’s Law is a word and its frequency is its appearance in a corpus of English text.  However, Zipf’s Law holds for texts generated from a fixed alphabet by picking letters randomly with a uniform distribution.

Heaps’ Law is about diminishing returns.  It has an exact formula, but generally it says that the more you look into a text, the fewer new discoveries of words you’ll find.  So, as you read through the text it takes longer and longer to find new words in the text.  Heaps’ Law applied to the general case where the “words” are just classifiers of some collection of things.  So, it could be applied to the nationality of a collection of people; you’d have to gather more and more people from a random sampling to get a representative from all countries.

The implications of these laws in various contexts are the subject of much interesting study and postulation.

Tuesday, June 16, 2009

Value Objects and Value Domains

I’ve written relatively extensively on the topic of replacing enum constructs with classes, so I won’t rehash the topic. Instead, I’d like to introduce you to some code I’ve written that enables you to create finite domains of Value Objects in C#.  Please see my two previous posts on the subject to learn more about the benefits of this approach.  To see how the Value Object semantics and making a finite domain a first class concept in the pattern improves the approach, read on.

First, we need some definitions.  We are taking the concept of a Value Object from Eric Evans’ book “Domain Driven Design” p.99:

When you care only about the attributes of an element of the model, classify it as a Value Object. Make it express the meaning of the attributes it conveys and give it related functionality. Treat the Value Object as immutable.  Don’t give it any identity […] The attributes that make up a Value Object should form a conceptual whole.

An example here is useful.  Consider modeling the domain of a book publisher.  At some point you need to capture all the different formats of books: A4, B, Pinched Crown, etc.  The attributes of width, height, unit of measure, and name would go into your model.  But, any instance of your A4 Value Object should be completely interchangeable from any other A4.  And, it goes without saying that you can’t change the width or height or any other attribute of A4.

All of the different formats of books belong to a Value Domain. According to WordNet, a domain (we are using the mathematical notion) is:

the set of values of the independent variable for which a function is defined

The functional complement of a domain is the range of the function; i.e. domain is what you can put in, the range is what you can expect to get out.  I like calling this concept in my approach a domain instead of a set, because it neatly captures a key benefit.  When we are writing code.  We want to declare our parameters as being a proper member of domain of values, instead of just primitive types or strings.

Now we’re ready to dive into the implementation.  Let’s begin with the end in mind.  I want to write a function, say, that let’s me search for a document with a particular format about an arbitrary topic.

void SearchDocs(DocFormat docFormat, string topic)


Ok, so we could create a base class or an interface called DocFormat and create Word doc, PDF, etc. that inherit from or implement DocFormat.  Easy.  But, SearchDocs has to be able to handle all current and future implementations of DocFormat; it must not violate the Liskov substitution principle. What if the repository or search algorithm depends on the what the doc format actually is?  Also, we’d have a subclass of DocFormat to write for every document type, and we’d have to do a lot of work to remove object identity, since your instance of PDF is not the same as my instance of PDF.  And, don’t forget to make the whole thing immutable. [Note: I know this is a contrived example with well-established OOP solutions that don’t require LSP violation.  Work with me here. :)]



Clearly we have a lot of work to do to make our Value Objects a reality.  It’s not impossible, though, and a quick Bing Google turns up a couple of investigations and approaches.  Jimmy Bogard’s approach gave me a clue that I needed to get it working.  What I wanted was a base type, ValueObject, that I could inherit from and get the Value Object semantics described in Evans.



Jimmy’s approach used a self-referencing open constructed type to allow the ValueObject base class to do all the work of providing Value Object semantics.  This base class uses dynamic reflection to determine what properties to use in the derived class to do equality comparisons (an approach nominally improved upon here). He saw his approach as having a single fundamental flaw; it only worked for the first level of inheritance, i.e. the first descendent from ValueObject.



For my purposes—creating a bounded, finite domain of Value Objects to replace enum-type classes--this is not a flaw.  Substantively, all that remains to do is introduce the concept of the the Value Domain into Jimmy’s approach and put the Value Objects in it.  Because I wanted to use these Value Domains throughout the enterprise, I baked WCF support into my approach.  Further, because the Value Domain is defined a priori, I didn’t have to play with object identity; I could simply look the value up in the domain.  (It took a little out-of-the-box thinking to get that to work transparently.)  Finally, I wanted it to be trivially easy to create these Value Domains, so I created a snippet for use in Visual Studio.



Here’s an example of Value Domain and Value Objects implementation:



[DataContract]
public sealed class DocFormats : ValueObject<DocFormats.DocFormat, string, DocFormats>.Values<DocFormats>
{
private DocFormats()
{
Word = Add("Word");
PDF = Add("PDF");
}

[DataMember]
public readonly BindingType Word, PDF;

[Serializable]
public sealed class DocFormat : ValueObject<DocFormat, string, DocFormats>
{
private DocFormat(string v) : base(v) { }
}
}



DocFormats is the Value Domain.  DocFormat is the type of the Value Object.  String is the underlying type of the values, but the pattern supports anything that implements IEquatable and IComparable.  Methods can accept DocFormats.DocFormat as a parameter and only Word and PDF will be valid.  Code can specify those values through a Singleton accessor: DocFormats.Instance.PDF.  You’ll notice that the only constructor is private; the Singleton implementation is in the base value domain base class (ValueObject<…>.Values<…>).



// our new method interface
void SearchDocs(DocFormats.DocFormat docFormat, string topic)
{
// referencing an individual value
if(DocFormats.Instance.Word == docFormat)
{
//...
}
}



Above you’ll note that the Value Object type definition (subclass of ValueObject<…>) is nested inside of the Value Domain.  Doing that groups the two together syntactically in a very natural way (FooTypes.FooType); the entire domain of values is contained in one place.  That locality makes the snippet to produce them cohesive too.  [Note: I’ve included the snippet XML at the end of this post.]



Interestingly, the underlying implementation necessarily inverts this nesting; the Value Domain base class is nested in the Value Object base class.  That allows the Value Domain Singleton to create instances of Value Objects.  I only had to resort to reflection in order to allow the Value Domain base class to provide the Singleton implementation.  Inversion of Control containers like Unity and Castle Windsor do this kind of reflection all the time, and it’s cheap since .NET 2.0. 



Without further ado, here’s the base classes implementation.



using System;
using System.Collections.Generic;
using System.Reflection;

[Serializable]
public abstract class ValueObject<T, TValue, TValues> : IEquatable<T>, IComparable, IComparable<T>
where T : ValueObject<T, TValue, TValues>
where TValue : IEquatable<TValue>, IComparable<TValue>
where TValues : ValueObject<T, TValue, TValues>.Values<TValues>
{
/// <summary>
///
This is the encapsulated value.
/// </summary>
public readonly TValue Value;
protected ValueObject(TValue value)
{
Value = value;
}

#region equality
public override bool Equals(object other)
{
return other != null && other is T && Equals(other as T);
}

public override int GetHashCode()
{
// TODO provide an efficient implementation
// http://www.dotnetjunkies.com/weblog/tim.weaver/archive/2005/04/04/62285.aspx
return Value.GetHashCode();
}

public bool Equals(T other)
{
return other != null && Value.Equals(other.Value);
}

public static bool operator ==(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
// pointing to same heap location
if (ReferenceEquals(x, y)) return true;

// both references are null
if (null == (object)(x ?? y)) return true;

// auto-boxed LHS is not null
if ((object)x != null)
return x.Equals(y);

return false;
}

public static bool operator !=(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
return !(x == y);
}
#endregion

public static implicit operator
TValue(ValueObject<T, TValue, TValues> obj)
{
return obj.Value;
}

public static implicit operator ValueObject<T, TValue, TValues>(TValue val)
{
T valueObject;
if (Values<TValues>.Instance.TryGetValue(val, out valueObject))
return valueObject;

throw new InvalidCastException(String.Format("{0} cannot be converted", val));
}

public override string ToString()
{
return Value.ToString();
}

#region comparison

public int CompareTo(T other)
{
return Value.CompareTo(other);
}

public int CompareTo(object obj)
{
if (null == obj)
throw new ArgumentNullException();

if (obj is T)
return Value.CompareTo((obj as T).Value);

throw new ArgumentException(String.Format("Must be of type {0}", typeof(T)));
}

#endregion

[Serializable]
public abstract class Values<TDomain> where TDomain : Values<TDomain>
{
[NonSerialized]
protected readonly Dictionary<TValue, T> _values = new Dictionary<TValue, T>();

private void Add(T valueObject)
{
_values.Add(valueObject.Value, valueObject);
}

protected T Add(TValue val)
{
var valueObject = typeof(T).InvokeMember(typeof(T).Name,
BindingFlags.CreateInstance | BindingFlags.Instance |
BindingFlags.NonPublic, null, null, new object[] { val }) as T;
Add(valueObject);
return valueObject;
}

public bool TryGetValue(TValue value, out T valueObject)
{
return _values.TryGetValue(value, out valueObject);
}

public bool Contains(T valueObject)
{
return _values.ContainsValue(valueObject);
}

static volatile TDomain _instance;
static readonly object Lock = new object();
public static TDomain Instance
{
get
{
if (_instance == null)
lock (Lock)
{
if (_instance == null)
_instance = typeof(TDomain)
.InvokeMember(typeof(TDomain).Name,
BindingFlags.CreateInstance |
BindingFlags.Instance |
BindingFlags.NonPublic, null, null, null) as TDomain;
}
return _instance;
}
}
}
}



I’d love to hear your feedback.  This is approach is not intended to support unbounded Value Domains, such as the canonical example of Address.  It is meant for the enums-as-classes problem, i.e. finite, bounded Value Domains.



ValueObject.snippet



<?xml version="1.0" encoding="utf-8" ?>
<
CodeSnippets xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
<
CodeSnippet Format="1.0.0">
<
Header>
<
Title>ValueOjbect</Title>
<
Shortcut>vo</Shortcut>
<
Description>Code snippet for ValueObject and domain</Description>
<
Author>Christopher Atkins</Author>
<
SnippetTypes>
<
SnippetType>Expansion</SnippetType>
</
SnippetTypes>
</
Header>
<
Snippet>
<
Declarations>
<
Literal>
<
ID>domain</ID>
<
ToolTip>Value domain name</ToolTip>
<
Default>MyValues</Default>
</
Literal>
<
Literal>
<
ID>value</ID>
<
ToolTip>ValueObject Class name</ToolTip>
<
Default>MyValueObject</Default>
</
Literal>
<
Literal>
<
ID>type</ID>
<
ToolTip>Value Type</ToolTip>
<
Default>string</Default>
</
Literal>
</
Declarations>
<
Code Language="csharp">
<![CDATA[
[DataContract]
public sealed class $domain$ : ValueObject<$domain$.$value$, $type$, $domain$>.Values<$domain$>
{
private $domain$()
{
Value1 = Add(String.Empty);
}

[DataMember]
public readonly $value$ Value1;

[Serializable]
public sealed class $value$ : ValueObject<$value$, $type$, $domain$>
{
private $value$($type$ v) : base(v) { }
}
}
]]>
</
Code>
</
Snippet>
</
CodeSnippet>
</
CodeSnippets>

Thursday, May 28, 2009

Google Wave

Upon the day mankind's children ask
Who it was made their world intelligible,
Those who set about the grand task
of creating the Wave will be eligible.

For never will they have occasion to think
how to create discourse, share knowledge,
Wonder about the universe and link
Without the this wonderful prodigy.

Like hieroglyphs in the tombs of Kings,
eee-may-uhls voy-s'-may-uhls eye-ems and tweets
will serious study to the curious bring,
to be shared in Waves while ancestors sleep.

How is it the first step of a journey
Can put the past so far behind thee?

Monday, May 25, 2009

Connascence: The Underlying Principle of OO Design?

This is a great talk at a Ruby conference by Jim Weirich about his attempt to frame all object-oriented design (OOD) principles as special cases of an underlying principle called “connascence”.  Connascence is a term co-opted by Meilir Page-Jones in the ‘90s for use in OOD; below is the definition from his book with Larry Constantine entitled, “Fundamentals of Object-Oriented Design in UML” (page 214):

Connascence between two software elements A and B means either

  1. that you can postulate some change to A that would require B to be changed (or at least carefully checked) in order to perserve overall correctness, or
  2. that you can postulate some change that would require both A and B to be changed together in order to preserve overall correctness.


UPDATE: Jim Weirich's talk can be found here.