Tuesday, December 29, 2009

Inchoate Thoughts on New Programming Abstractions for the Cloud

I watched an InfoQ video of Barbara Liskov at OOPSLA this evening.  At the end of her talk in a look at the challenges looking forward she said, “I’ve been waiting for thirty years for a new abstraction mechanism,” and, “There’s a funny disconnect in how we write distributed programs.  You write your individual modules, but then when you want to connect them together you’re out of the programming language, and sort of into this other world.  Maybe we need languages that are a little more complete now so that we can write the whole thing in the language.”  Her talk was an interesting journey through the history of Computer Science through the lens of her work, but this notion connected with something I’ve been thinking about lately.

Earlier this month I was catching up on the happenings on Channel9 and stumbled on this conversation between Erik Meijer and Dave Thomas.  I would have preferred a different interview-style, but Dave Thomas’ opinions are always interesting.  “I find it hard to believe that people would build a business application in Java or C#. […] The real opportunity is recognizing that there is really no need for people to do all of the stuff they have to do today.  And, I really believe that is fundamentally a language problem, whether that’s through an extended SQL, or its a new functional language, whether its a vector functional language; I can see lots of possibilities there. […] I see a future where there are objects in the modeling layer, but not in the execution infrastructure.”

I see a connection between Liskov’s desire “new abstraction mechanism” and Thomas’ “language problem”.  If you look at the history of “execution infrastructure”, there has been an unremitting trend toward greater and greater abstraction of the hardware that makes it all happen.  From twiddling bits, to manipulating registers, to compiled languages, to compiled to virtual machine bytecode and attendant technologies like garbage collection, interpreters, to very declarative constructs, we continually move away from the hardware to focus on the actual problems that we began writing the program to solve in the first place(1).  Both Liskov and Thomas are bothered by the expressivity of languages; that is, programming languages are still burdened with the legacy of the material world.

I think this may very well be the point of “Oslo” and the “M” meta-language.  One might view that effort as a concession that even DSLs are too far burdened by their host language’s legacy to effectively move past the machine programming era into the solution modeling era.  So, rather than create a language with which to write solution code, create a language/runtime to write such languages.  This is really just the logical next step, isn’t it?

I didn’t quite understand this viewpoint at the time, but I had this discussion to some extent with Shy Cohen at TechEd09.  I just didn’t grasp the gestalt of “Oslo”.  This certainly wasn’t a failing of his explanatory powers nor—hopefully—my ability to understand.  Rather I think it’s because they keep demoing it as one more data access technology.  The main demo being to develop an M grammar of some simple type and have a database created.  Again, the legacy of computing strikes again.  To make the demo relevant, they have to show you how it works with all your existing infrastructure.

So, maybe the solution to developing new abstractions is to once and for all abstract away the rest of the infrastructure.  Both Liskov and Thomas were making this point.  Why should we care how and where the data is stored?  Why should most developers care about concurrency?  Why should we have to understand the behemoth that is WCF in order to get to modules/systems/components talking to each other?

Let’s start over with an abstraction of the entirety of computing infrastructure(2): servers, networking, disks, backups, databases, etc..  We’ll call it, oh, I don’t know… a cloud.  Now, what properties must a cloud exhibit in order to ensure this abstraction doesn’t leak?  I suggest we can examine why existing abstractions in computing have been successful, understand the problems they solve at a fundamental level, and then ensure that our cloud can do the same.  Some candidates for this examination are: relational databases, virtual machines/garbage collectors, REST, Ruby on Rails, and LINQ.  I’m sure there are countless other effective abstractions that could serve as guideposts.

Should we not be able to express things naturally in a solution modeling language?  Why can’t I write, “When Company X sends us a shipment notification for the next Widget shipment, make sure that all Whatzit production is expedited. And, let me know immediately if Whatzit production capacity is insufficient to meet our production goals for the month.”  Sure, this is a bit imperative, but it’s imperatively modeling a solution, not instructions on how to push bits around to satisfy the solution model; in that sense it is very, very declarative.  I believe the cloud abstraction enables this kind of solution modeling language.

How about a cloud as the ambient monad for all programs?  What about a Google Wave bot as a sort of REPL loop for this solution modeling language?  What about extensive use of the Maybe monad or amb style evaluation in a constraint driven lexicon so that imperative rules like those in the sample above are evaluated in a reactive rather than deterministic fashion?

  1. Quantum computing productively remains undecided as to “how” it actually works.  That is, “how” is somewhat dependent on your interpretation of quantum mechanics.  Maybe an infinite number of calculation occur simultaneously in infinite universes with the results summed over via quantum mechanical effects.
  2. I should say here that I think the Salesforce.com model is pretty far down this path, as is Windows Azure to a lesser extent.  Crucially, neither of these platforms mitigate the overhead of persistence and communications constructs.

Monday, December 21, 2009

System.Tuple: More .NET 4.0 Functional Goodness

In functional languages, tuples are pretty much a requirement.  Of the functional languages I’ve used, including Erlang and T-SQL, tuples are bread’n’butter.  Every release of .NET has included more and more support for functional programming paradigms, and we’re getting tuple support in the BCL in .NET 4.0, though first-class language support is still not there.  Here’s an example of using System.Tuple with the new Zip LINQ operator.

Code Snippet
  1. var xs = new List<int> { 1, 3, 5 };
  2. var ys = new List<string> { "hello", "world", "channel" };
  3. var zs = xs.Zip(ys, Tuple.Create);
  4.  
  5. foreach (var z in zs)
  6.     Console.WriteLine("{1}:{0}",z.Item1, z.Item2);

Thursday, December 17, 2009

Quicksort Random Integers with PLINQ

The first “homework assignment” of Dr. Erik Meijer’s functional programming course on Channel 9 was to write quicksort using C# (or VB) functional comprehensions.  I did it in a very naïve way, then I wondered if I could get a “free” speed up simply by leveraging some of the Parallel Extensions to .NET 4.0. 
 
My results were mixed.  As the number of integers in the unsorted array increases, the parallel implementation begins to pull away, but with smaller numbers the sequential implementation is faster.  I think I’ll try to do something very similar in Erlang this weekend.  It should be an interesting comparison exercise since Erlang terms are immutable.  A naïve implementation should look pretty similar I think.  Neither of them would be true quicksort implementations, since the algorithm was specified by Hoare to work in a mutable fashion with the original array being changed in situ.
 
My dual core machine yielded times of about 80s and 100s when sorting an array of a thousand for the parallel (QsortP) and sequential (Qsort) methods respectively.  So that’s definitely not impressive, but I did it in a very abstract way.  The same function could sort strings or floats or anything else that implements IComparable. 

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Threading.Tasks;
  5. using System.Diagnostics;
  6.  
  7. namespace ConsoleApplication3
  8. {
  9.     class Program
  10.     {
  11.         static void Main(string[] args)
  12.         {
  13.             int[] ints = new int[100];
  14.             var rand = new Random();
  15.             for (int i = 0; i < ints.Length; i++)
  16.                 ints[i] = rand.Next(0, Int32.MaxValue);
  17.             int[] sortedP;
  18.             int[] sorted;
  19.             Time(() => sortedP = QsortP(ints).ToArray());
  20.             Time(() => sorted = Qsort(ints).ToArray());
  21.             Debugger.Break();
  22.         }
  23.  
  24.         public static IEnumerable<T> QsortP<T>(IEnumerable<T> arr) where T : IComparable
  25.         {
  26.             if (arr.Count() == 0)
  27.                 return arr;
  28.             T pivot = arr.First();
  29.             var sortLess = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) < 0)));
  30.             var sortMore = new Task<IEnumerable<T>>(() => QsortP(arr.Where(v => v.CompareTo(pivot) > 0)));
  31.             sortLess.Start(); sortMore.Start();
  32.             var equal = arr.AsParallel().Where(v => v.CompareTo(pivot) == 0);
  33.             return sortLess.Result.Concat(equal).Concat(sortMore.Result);
  34.         }
  35.  
  36.         public static IEnumerable<T> Qsort<T>(IEnumerable<T> arr) where T : IComparable
  37.         {
  38.             if (arr.Count() == 0)
  39.                 return arr;
  40.             T pivot = arr.First();
  41.             return Qsort(arr.Where(v => v.CompareTo(pivot) < 0))
  42.                 .Concat(arr.Where(v => v.CompareTo(pivot) == 0))
  43.                 .Concat(Qsort(arr.Where(v => v.CompareTo(pivot) > 0)));
  44.         }
  45.  
  46.         public static void Time(Action a)
  47.         {
  48.             var t = new Stopwatch();
  49.             t.Start();
  50.             a();
  51.             t.Stop();
  52.             Debug.Write(t.Elapsed.TotalSeconds);
  53.         }
  54.     }
  55. }

Tuesday, December 15, 2009

Premature Optimization and LINQ to SQL

Don’t count your chickens before they hatch.  Let’s call that adage the “count-no” principle.  I promise that will be funny later.

The common interpretation of this adage is that you should not rely on future results when making decisions in the present.  It is generally an admonishment to not be too cocky about the future.  What might the corollary be?  Perhaps to concern ourselves with the details of the present moment rather than thinking about the future.  We’ll call this corollary “nocount”.  One specific example of this corollary is the programmer’s maxim about premature optimization.  In fact, it was Donald Knuth who said, “"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil,” in a response to Dijkstra’s “Go to statement considered harmful”.

So, I’m debugging a particularly pernicious problem today.  In production (and only production) we were getting a ChangeConflictException when updating a row in the database.  Unfortunately, the logs indicated that there were no MemberChangeConflict objects.  In other words we weren’t violating optimistic concurrency.  So, I profiled the SQL statements and it appeared to be just what I expected, a simple update supporting optimistic concurrency.

Did you know you can eek out an exceedingly miniscule amount of performance by using the “SET NOCOUNT ON” option with SQL Server?  Did you know that you can actually set this option at the server level? Do you know what happens when your LINQ to SQL DataContext uses the row count returned from your optimistically concurrent update statement to determine if one and only one row was updated?

Yes dear reader, premature optimization can cause the most insidious and difficult to discover problems.  Please, don’t ever violate the “nocount princple”!

Tuesday, November 10, 2009

Create an XML Schema Enumeration of Supported .NET CultureInfo with Powershell

Here’s a Powershell script to generate the enumeration elements of an XML Schema simpleType restriction.  The first command loads the .NET Assembly you’ll need, sysglobl.dll.

PS> [System.Reflection.Assembly]::Load("sysglobl, Version=2.0.0.0, Culture=neutral, PublicKeyToken= b03f5f7f11d50a3a, processorArchitecture=MSIL")
PS> [System.Globalization.CultureInfo]::GetCultures([System.Globalization.CultureTypes]::FrameworkCultures)
| ? {$_.Name.Trim().Length -eq 5} | % { "<xs:enumeration value=`"$_`"/>"} | Out-File cultures.txt

See my earlier post on locale identifiers for more background.

Friday, October 30, 2009

Whoa!!! HP Software Sucks!

image

This is the error that was reported after the HP installer forced me to restart my computer to continue installation…

Thursday, October 15, 2009

TFS: Locked for Check-out

Today a co-worker was attempting to edit a file, and TFS reported that it was “locked for check-out by [me] in workspace …”.

I didn’t have it checked out.  I tried to use “tf lock /lock:none” on the file in question, but it would simply report:

TF10152: The item … must remain locked because its file type prevents multiple check-outs.

We don’t allow multiple checkouts for this Team Project.  Unfortunately, this meant that not only didn’t I intend to lock the file, I couldn’t unlock it.

My solution was to just check-in some dummy changes, comments full of expletives if you must know, since TFS will not perform a check-in unless there are actually changes to the files.

So yeah, not TFS’ most shining moment.

Thursday, August 6, 2009

Cloud Morphology

Spent some time this evening tracking down the various component technologies of the myriad cloud offerings out there.  Here’s the result.  This will be updated over time.

Tuesday, August 4, 2009

NASA’s Cloud Platform: NEBULA, and Amorphous Morphology

NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies.

They always write it in all caps, though it doesn’t appear to be an acronym. My guess is that it’s easier to search for “NEBULA” and find what you want at NASA. And, what better name for a cloud computing platform?

To be frank, I didn’t recognize any of the technologies other than RabbitMQ and Apache. So I set out to find out what each piece does, and it’s these times that really make me envious of the LAMP folks. There is so much going on in the OSS world!

In fact, it’s that sheer volume of innovation that makes it such a nightmare. Here’s where having NASA or Amazon or Elastra or EngineYard or Rackspace or Google AppEngine discuss their respective cloud infrastructures is so beneficial. If you look at their technology choices as a Venn diagram, you might find an optimal set of technologies in the overlap. Beyond that, you can begin to form a generalized blueprint of a cloud platform, identifying the core components.

cloud provider Venn diagram

In this way the cloud begins to take shape. Were we to sit all the big industry players down and say, define the cloud al a the OSI model, no one would have gotten around to building one. If we find the clouds in the wild, we can dissect them and find out what makes them tick. We can reverse engineer our blueprint.

There’s a few major players missing here, of course, most notably Salesforce.com (SFDC) and Microsoft’s Azure platform.  SFDC doesn’t have an infrastructure that you can replicate, per se, because it is inseparable from their proprietary technologies.  In fact, Stu Charlton and others have noted that this is their biggest failing.  Similarly, Microsoft’s Azure is also a monolithic platform play, with the major difference that the platform can be deployed on premises, a so-called private cloud.

Still though, we can include SFDC and Azure in our analysis.  We can develop our morphology of the cloud platforms that have published their constituent technologies.  With the language of this morphology in hand, we can classify the facilities of other cloud platforms.  Perhaps a taxonomy will evolve that allows us to identify Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service, hybrids, and perhaps some novel species.

Despite the amorphous character implied by the term cloud computing, these platforms have well-defined structure.  Moreover, unlike the traditional use of clouds in architecture diagrams, the details of this structure are important.

Sunday, August 2, 2009

Powershell: Get Size of Child Directories

gci -recurse -force |
? { $_.GetType() -like 'System.IO.DirectoryInfo'} |
     select-object Name,@{Name="size"; Expression = {
    ($_.GetFiles()  |
    Measure-Object -Property Length -Sum |
     Measure-Object -Property Sum -Sum).Sum}} | sort-object -Property size

Wednesday, July 8, 2009

Three Common Fallacies Concerning REST

Better to light a candle than to curse the darkness. -Chinese Proverb

I purposely did not title this post “The 3 Fallacies of RESTful Computing” as I am certainly not an expert in either REST or fallacy. :)  I am, however, quite well-versed in auto-didacticism, and over the past week I’ve been boning up on REST.  Along the way I’ve had one of my early notions of REST disabused (it is neither an architecture nor a protocol) and noticed a few other common misconceptions in the blogosphere and tweet stream.  If you are new to REST, or even if you aren’t, you might very well find a few edifying points in this post; I hope to light a candle or two out there.

Without further ado, here are three common fallacies concerning REST. 

Fallacy #1 REST is CRUD

Perhaps the most common fallacy of RESTful computing is that REST is simply CRUD (Create, Read, Update, and Delete) over HTTP.  Microsoft’s ADO.NET Data Services endeavors to provide developers a “data service being surfaced to the web as a REST-style resource collection[…]”; this would seem to further the notion that REST is another way of doing CRUD.

However, REST is not CRUD as Stu Charlton states in a blog post about REST design guidelines; Arnon Rotem-Gal-Oz says CRUD is bad for REST, too.  If we are going to attempt to abridge RESTful architecture with an innocuous statement of the form “REST is X over HTTP” let us say that REST is using URLs to facilitate application state changes over HTTP.

Fallacy #2 POST is not RESTful

First, it is very important to note that REST is not tied to a specific protocol.  As an architectural style, it is protocol agnostic; though to be sure HTTP is a natural fit for many reasons.  As Fielding said in It is okay to use POST:

Search my dissertation and you won’t find any mention of CRUD or POST. The only mention of PUT is in regard to HTTP’s lack of write-back caching.  The main reason for my lack of specificity is because the methods defined by HTTP are part of the Web’s architecture definition, not the REST architectural style.

Since the nominal use of POST is orthogonal to RESTfulness, by definition, it cannot be the case that POST is antithetical to REST.  Nevertheless, it is important to understand the reasoning that generally goes into this fallacy, because it speaks directly to a core principle of REST.  Most architectures expose an API that allows consumers to affect the state of the application only indirectly.  To understand the implications of your actions as a consumer, you then have—at best—to be very familiar with the application architecture and be aware that you are making a lot of assumptions.  The state mechanics are hidden from you.  You cannot explicitly move the application from one state to another, nor can you directly observe the transition(s) that have taken place. 

A primary goal of Representational State Transfer is to make the application’s state machine unambiguous by exposing representations of application resources that have embedded within them URIs pertinent to the resource.  In this way a consumer of a RESTful service can discover the current state of the resource, the mechanisms to affect change in the resource, and other resources related to the current resource in the application’s state.  This is what is known as Hypertext as the Engine of Application State (HatEoAS).

HatEoAS implies a model where all of your important resources are represented by unique URIs and all important state changes are done by interacting with representations sent to or retrieved from those URIs. Most people first approaching REST view it in terms of opposing other architectural “styles” such as SOA or get mired in implementation immediately and begin contrasting their understanding of REST over HTTP against WS-*. Another common problem is reducing the ethos of REST to “RPC is bad” and “we don’t need all that complexity, we have HTTP” (see REST is better than WS-*).  These views are commonplace because REST is being promulgated as a better solution for many types of applications on the Web.

The specifics of how REST works over HTTP are beyond the scope of this article, and the subject of a lot of debate, but, since a lot of uses of POST seem very much like RPC-style invocations, people have a knee-jerk reaction that POST is not RESTful.  By now you know this is not the case, but let’s hear from the experts.

From Fielding:

POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT) […]

Stu Charlton’s design guidelines say nearly the same thing:

The problem with POST is when we abuse it by having it perform things that are more expressive in one of the other methods. GET being the obvious one that needs no hypermedia description. For the other methods, a good design guideline is that you MUST not break the general contract of the HTTP method you choose -- but you SHOULD describe the specific intent of that method in hypermedia.

Fallacy #3 REST is better than WS-*

In fact, Fielding’s thesis does address many of the problems and advantages of various network-based architectural styles, but no where does he claim REST is the one ring to rule them all.  In his aforementioned blog post Fielding says (emphasis my own),

[…]there are plenty of information systems that can be designed using the REST architectural style and gain the associated benefits. Managing cloud instances is certainly one of those applications for which REST is a good fit[…]

From his thesis, here is the definition of the REST architectural style.

REST consists of a set of architectural constraints chosen for the properties they induce on candidate architectures. […] [It] is an abstraction of the architectural elements within a distributed hypermedia system. […] It encompasses the fundamental constraints upon components, connectors, and data that define the basis of the Web architecture, and thus the essence of its behavior as a network-based application.

So, the associated benefits are induced by constraints that collectively are referred to as the REST architectural style.  The benefits are myriad, and many of the constraints are are recognizable in how the web works. Section #5 of the thesis is concise and readable, and I encourage the reader to internalize it. 

The salient point here is that REST is not a protocol; it’s not even an architecture.  REST is an architectural style! By understanding its constraints and benefits, we can make informed decisions about its applicability to our problem domain and appropriation of technologies.  Comparing REST to the WS-* suite of protocols is comparing apples to oranges, though there are those who strongly argue the benefits of REST and HTTP over SOAP.

Saturday, June 27, 2009

What, Not How & Why, Not When

It occurred to me this morning that many software development principles seem to emerge from the rigorous application of the following principle:

Your architecture and code should make What & Why explicit without specifying How & When.
What, Not How

It is well known that we should prefer declarative over imperative code. By observing the Law of Demeter, we are in fact forced to create declarative, What, semantics in our interfaces since we can't get at the How, the imperative constructs. The Single-Responsibility Principle tends to force us to factor out the How into other classes, leaving us to consume other classes with What semantics. Further, the Interface Segregation Principle requires us to explicity segregate the interfaces our clients consume based on What they are trying to accomplish; if our focus was How, such segregation would be less of a concern.

Event-Driven Architectures (EDA) are another example of What, Not How. Whereas the SOLID principles operate at the class design level of abstraction, EDA is concerned with system-level architecture. In an Event-Drive Architecture, we explicitly model happenings in the domain. Rather than coupling the site of the happening to a specific party designating for dealing with that happening, we create an Event and use reliabile messaging and subscription schemes to allow one or many services to handle it. In other words, instead of specifying How to deal with a happening at the site that generates it, we explicitly model What happened and let other parties worry about How to deal with it.

Why, Not When

This maxim is both more subtle and more prosaic than What, Not How. It is probably pretty obvious that when given a requirement stated, "if the purchase order acknowledgement is not received within two hours, notify someone in fulfillment," we should model an Event "POAckIsLate" as opposed to "TwoHoursHaveElapsedWithoutReceivingPOAck". We will have different SLAs with different vendors; those SLAs will change, etc. So we can say, when modeling Events in our domain, we should prefer specifying Why, Not When.

Perhaps more subtle is the implications for communication semantics between modules. If we model our communications with Why in mind, we don't get mired in the concurrency problems of specifying When. Consider a workflow. If we specify When to go to a particular step, the underlying reason may have changed unless we take some sort of explicit lock on shared state. If we instead specify Why a particular state transition takes place, we can avoid inconsistent states through continuous evaluation. If we make Why explicit and consequently create semantics to evaluate Why independently of "current" state, it becomes possible to evaluate the state consistently without any shared state, i.e. without a notion of When.

As an example, if we had the requirement, "when a PO is received with a quantity for part X above a twenty units, move the order to the top of the work queue," we should model a "BulkProductionRequest" Event and an "ExpediteProduction"; we should not implement a "Reprioritize Production Queue For Order of PartX Over Twenty Units". Begin with the end in mind and ask What do we want to do (expedite production) not How (re-prioritize production queue). Ask Why are we expediting this order? Because it is Bulk. What is Bulk? Bulk is a quality determined by a CapacityPlanning service and implies that the quantity exceeds some production capacity threshold.

Friday, June 26, 2009

Pareto, Zipf, Heap: The 80-20 Rule, Language, and Diminishing Returns

Please consider this a sort of layman’s disambiguation page.

The classic “80-20 rule” refers to a Pareto distribution.  The thin distribution of the 20% is the subject of Chris Anderson’s “The Long Tail”.  Originally, the Pareto distribution referred to the fact that 20% of the people control 80% of the wealth, but it has turned up in many other contexts.

A Zipf’s Law is about rankings and frequency.  The second item’s frequency will be half of the first; the third’s frequency will be half the second, and so on.  The “half” may be some other factor, but it remains constant in the distribution; the rank is inversely proportional to the frequency. The item in Zipf’s Law is a word and its frequency is its appearance in a corpus of English text.  However, Zipf’s Law holds for texts generated from a fixed alphabet by picking letters randomly with a uniform distribution.

Heaps’ Law is about diminishing returns.  It has an exact formula, but generally it says that the more you look into a text, the fewer new discoveries of words you’ll find.  So, as you read through the text it takes longer and longer to find new words in the text.  Heaps’ Law applied to the general case where the “words” are just classifiers of some collection of things.  So, it could be applied to the nationality of a collection of people; you’d have to gather more and more people from a random sampling to get a representative from all countries.

The implications of these laws in various contexts are the subject of much interesting study and postulation.

Tuesday, June 16, 2009

Value Objects and Value Domains

I’ve written relatively extensively on the topic of replacing enum constructs with classes, so I won’t rehash the topic. Instead, I’d like to introduce you to some code I’ve written that enables you to create finite domains of Value Objects in C#.  Please see my two previous posts on the subject to learn more about the benefits of this approach.  To see how the Value Object semantics and making a finite domain a first class concept in the pattern improves the approach, read on.

First, we need some definitions.  We are taking the concept of a Value Object from Eric Evans’ book “Domain Driven Design” p.99:

When you care only about the attributes of an element of the model, classify it as a Value Object. Make it express the meaning of the attributes it conveys and give it related functionality. Treat the Value Object as immutable.  Don’t give it any identity […] The attributes that make up a Value Object should form a conceptual whole.

An example here is useful.  Consider modeling the domain of a book publisher.  At some point you need to capture all the different formats of books: A4, B, Pinched Crown, etc.  The attributes of width, height, unit of measure, and name would go into your model.  But, any instance of your A4 Value Object should be completely interchangeable from any other A4.  And, it goes without saying that you can’t change the width or height or any other attribute of A4.

All of the different formats of books belong to a Value Domain. According to WordNet, a domain (we are using the mathematical notion) is:

the set of values of the independent variable for which a function is defined

The functional complement of a domain is the range of the function; i.e. domain is what you can put in, the range is what you can expect to get out.  I like calling this concept in my approach a domain instead of a set, because it neatly captures a key benefit.  When we are writing code.  We want to declare our parameters as being a proper member of domain of values, instead of just primitive types or strings.

Now we’re ready to dive into the implementation.  Let’s begin with the end in mind.  I want to write a function, say, that let’s me search for a document with a particular format about an arbitrary topic.

void SearchDocs(DocFormat docFormat, string topic)


Ok, so we could create a base class or an interface called DocFormat and create Word doc, PDF, etc. that inherit from or implement DocFormat.  Easy.  But, SearchDocs has to be able to handle all current and future implementations of DocFormat; it must not violate the Liskov substitution principle. What if the repository or search algorithm depends on the what the doc format actually is?  Also, we’d have a subclass of DocFormat to write for every document type, and we’d have to do a lot of work to remove object identity, since your instance of PDF is not the same as my instance of PDF.  And, don’t forget to make the whole thing immutable. [Note: I know this is a contrived example with well-established OOP solutions that don’t require LSP violation.  Work with me here. :)]



Clearly we have a lot of work to do to make our Value Objects a reality.  It’s not impossible, though, and a quick Bing Google turns up a couple of investigations and approaches.  Jimmy Bogard’s approach gave me a clue that I needed to get it working.  What I wanted was a base type, ValueObject, that I could inherit from and get the Value Object semantics described in Evans.



Jimmy’s approach used a self-referencing open constructed type to allow the ValueObject base class to do all the work of providing Value Object semantics.  This base class uses dynamic reflection to determine what properties to use in the derived class to do equality comparisons (an approach nominally improved upon here). He saw his approach as having a single fundamental flaw; it only worked for the first level of inheritance, i.e. the first descendent from ValueObject.



For my purposes—creating a bounded, finite domain of Value Objects to replace enum-type classes--this is not a flaw.  Substantively, all that remains to do is introduce the concept of the the Value Domain into Jimmy’s approach and put the Value Objects in it.  Because I wanted to use these Value Domains throughout the enterprise, I baked WCF support into my approach.  Further, because the Value Domain is defined a priori, I didn’t have to play with object identity; I could simply look the value up in the domain.  (It took a little out-of-the-box thinking to get that to work transparently.)  Finally, I wanted it to be trivially easy to create these Value Domains, so I created a snippet for use in Visual Studio.



Here’s an example of Value Domain and Value Objects implementation:



[DataContract]
public sealed class DocFormats : ValueObject<DocFormats.DocFormat, string, DocFormats>.Values<DocFormats>
{
private DocFormats()
{
Word = Add("Word");
PDF = Add("PDF");
}

[DataMember]
public readonly BindingType Word, PDF;

[Serializable]
public sealed class DocFormat : ValueObject<DocFormat, string, DocFormats>
{
private DocFormat(string v) : base(v) { }
}
}



DocFormats is the Value Domain.  DocFormat is the type of the Value Object.  String is the underlying type of the values, but the pattern supports anything that implements IEquatable and IComparable.  Methods can accept DocFormats.DocFormat as a parameter and only Word and PDF will be valid.  Code can specify those values through a Singleton accessor: DocFormats.Instance.PDF.  You’ll notice that the only constructor is private; the Singleton implementation is in the base value domain base class (ValueObject<…>.Values<…>).



// our new method interface
void SearchDocs(DocFormats.DocFormat docFormat, string topic)
{
// referencing an individual value
if(DocFormats.Instance.Word == docFormat)
{
//...
}
}



Above you’ll note that the Value Object type definition (subclass of ValueObject<…>) is nested inside of the Value Domain.  Doing that groups the two together syntactically in a very natural way (FooTypes.FooType); the entire domain of values is contained in one place.  That locality makes the snippet to produce them cohesive too.  [Note: I’ve included the snippet XML at the end of this post.]



Interestingly, the underlying implementation necessarily inverts this nesting; the Value Domain base class is nested in the Value Object base class.  That allows the Value Domain Singleton to create instances of Value Objects.  I only had to resort to reflection in order to allow the Value Domain base class to provide the Singleton implementation.  Inversion of Control containers like Unity and Castle Windsor do this kind of reflection all the time, and it’s cheap since .NET 2.0. 



Without further ado, here’s the base classes implementation.



using System;
using System.Collections.Generic;
using System.Reflection;

[Serializable]
public abstract class ValueObject<T, TValue, TValues> : IEquatable<T>, IComparable, IComparable<T>
where T : ValueObject<T, TValue, TValues>
where TValue : IEquatable<TValue>, IComparable<TValue>
where TValues : ValueObject<T, TValue, TValues>.Values<TValues>
{
/// <summary>
///
This is the encapsulated value.
/// </summary>
public readonly TValue Value;
protected ValueObject(TValue value)
{
Value = value;
}

#region equality
public override bool Equals(object other)
{
return other != null && other is T && Equals(other as T);
}

public override int GetHashCode()
{
// TODO provide an efficient implementation
// http://www.dotnetjunkies.com/weblog/tim.weaver/archive/2005/04/04/62285.aspx
return Value.GetHashCode();
}

public bool Equals(T other)
{
return other != null && Value.Equals(other.Value);
}

public static bool operator ==(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
// pointing to same heap location
if (ReferenceEquals(x, y)) return true;

// both references are null
if (null == (object)(x ?? y)) return true;

// auto-boxed LHS is not null
if ((object)x != null)
return x.Equals(y);

return false;
}

public static bool operator !=(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
return !(x == y);
}
#endregion

public static implicit operator
TValue(ValueObject<T, TValue, TValues> obj)
{
return obj.Value;
}

public static implicit operator ValueObject<T, TValue, TValues>(TValue val)
{
T valueObject;
if (Values<TValues>.Instance.TryGetValue(val, out valueObject))
return valueObject;

throw new InvalidCastException(String.Format("{0} cannot be converted", val));
}

public override string ToString()
{
return Value.ToString();
}

#region comparison

public int CompareTo(T other)
{
return Value.CompareTo(other);
}

public int CompareTo(object obj)
{
if (null == obj)
throw new ArgumentNullException();

if (obj is T)
return Value.CompareTo((obj as T).Value);

throw new ArgumentException(String.Format("Must be of type {0}", typeof(T)));
}

#endregion

[Serializable]
public abstract class Values<TDomain> where TDomain : Values<TDomain>
{
[NonSerialized]
protected readonly Dictionary<TValue, T> _values = new Dictionary<TValue, T>();

private void Add(T valueObject)
{
_values.Add(valueObject.Value, valueObject);
}

protected T Add(TValue val)
{
var valueObject = typeof(T).InvokeMember(typeof(T).Name,
BindingFlags.CreateInstance | BindingFlags.Instance |
BindingFlags.NonPublic, null, null, new object[] { val }) as T;
Add(valueObject);
return valueObject;
}

public bool TryGetValue(TValue value, out T valueObject)
{
return _values.TryGetValue(value, out valueObject);
}

public bool Contains(T valueObject)
{
return _values.ContainsValue(valueObject);
}

static volatile TDomain _instance;
static readonly object Lock = new object();
public static TDomain Instance
{
get
{
if (_instance == null)
lock (Lock)
{
if (_instance == null)
_instance = typeof(TDomain)
.InvokeMember(typeof(TDomain).Name,
BindingFlags.CreateInstance |
BindingFlags.Instance |
BindingFlags.NonPublic, null, null, null) as TDomain;
}
return _instance;
}
}
}
}



I’d love to hear your feedback.  This is approach is not intended to support unbounded Value Domains, such as the canonical example of Address.  It is meant for the enums-as-classes problem, i.e. finite, bounded Value Domains.



ValueObject.snippet



<?xml version="1.0" encoding="utf-8" ?>
<
CodeSnippets xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
<
CodeSnippet Format="1.0.0">
<
Header>
<
Title>ValueOjbect</Title>
<
Shortcut>vo</Shortcut>
<
Description>Code snippet for ValueObject and domain</Description>
<
Author>Christopher Atkins</Author>
<
SnippetTypes>
<
SnippetType>Expansion</SnippetType>
</
SnippetTypes>
</
Header>
<
Snippet>
<
Declarations>
<
Literal>
<
ID>domain</ID>
<
ToolTip>Value domain name</ToolTip>
<
Default>MyValues</Default>
</
Literal>
<
Literal>
<
ID>value</ID>
<
ToolTip>ValueObject Class name</ToolTip>
<
Default>MyValueObject</Default>
</
Literal>
<
Literal>
<
ID>type</ID>
<
ToolTip>Value Type</ToolTip>
<
Default>string</Default>
</
Literal>
</
Declarations>
<
Code Language="csharp">
<![CDATA[
[DataContract]
public sealed class $domain$ : ValueObject<$domain$.$value$, $type$, $domain$>.Values<$domain$>
{
private $domain$()
{
Value1 = Add(String.Empty);
}

[DataMember]
public readonly $value$ Value1;

[Serializable]
public sealed class $value$ : ValueObject<$value$, $type$, $domain$>
{
private $value$($type$ v) : base(v) { }
}
}
]]>
</
Code>
</
Snippet>
</
CodeSnippet>
</
CodeSnippets>

Thursday, May 28, 2009

Google Wave

Upon the day mankind's children ask
Who it was made their world intelligible,
Those who set about the grand task
of creating the Wave will be eligible.

For never will they have occasion to think
how to create discourse, share knowledge,
Wonder about the universe and link
Without the this wonderful prodigy.

Like hieroglyphs in the tombs of Kings,
eee-may-uhls voy-s'-may-uhls eye-ems and tweets
will serious study to the curious bring,
to be shared in Waves while ancestors sleep.

How is it the first step of a journey
Can put the past so far behind thee?

Monday, May 25, 2009

Connascence: The Underlying Principle of OO Design?

This is a great talk at a Ruby conference by Jim Weirich about his attempt to frame all object-oriented design (OOD) principles as special cases of an underlying principle called “connascence”.  Connascence is a term co-opted by Meilir Page-Jones in the ‘90s for use in OOD; below is the definition from his book with Larry Constantine entitled, “Fundamentals of Object-Oriented Design in UML” (page 214):

Connascence between two software elements A and B means either

  1. that you can postulate some change to A that would require B to be changed (or at least carefully checked) in order to perserve overall correctness, or
  2. that you can postulate some change that would require both A and B to be changed together in order to preserve overall correctness.


UPDATE: Jim Weirich's talk can be found here.

Thursday, May 21, 2009

WolframAlpha: A New Kind of Site

TechCrunch called Google Squared the imminent Cain to WolframAlpha’s Abel. Dare Obsanajo warns Wikipedia to beware the ides of March, casting WolframAlpha as Brutus.  But, I would cast WolframAlpha in quite a different role: Hamlet.

If you’ve not used WolframAlpha, head over there come back once you’ve exhausted its novelty.  Go ahead, I’ll wait.

So, now you know why they call it a “computational knowledge engine” whose mission it is to “make the world’s knowledge computable”.  Ignoring, for the present, the interesting epistemological discussion viz. the validity of that mission, let’s instead talk about why you might use it.

If you are looking for something, go to Google.  This is not a search engine.

If you are looking to mine the web for hard data, be patient, Google Squared will be your guide (when it is baked).

If you would like to learn about a particular topic, Wikipedia stands ready to help you begin.

If you want to apply your existing domain knowledge to posing interesting questions about a wide variety of topics, if you can phrase such questions as nominal computations, then go to WolframAlpha, though your attempts might be frustrated, e.g. it fails to compute the following “annual energy consumption of the average American * projected worldwide population in 2020.”

In other words for 98% of the world, WolframAlpha is simply a curiosity. For the other 2%, it’s only useful as a way to experiment with such questions, since the rigor required of formal academia is not satisfied by simply saying, “WolframAlpha said so.”

To understand WolframAlpha’s raison d'être, you have to understand NKS: Stephen Wolfram’s “New Kind of Science”.  To understand NKS you can either go read the book or trust what I’m about to say.  At its core, the claim it makes to novelty is based on the notion that all phenomena can be viewed as computations, and thus real science can be done by exploring the “universe” of possible computations.  Of course the only apparatus sufficient for doing this “new” science is Wolfram’s own Mathematica program, a program originally created by Dr. Wolfram in a single summer.  Twenty or so years later after leaving academia to build the company that develops and sells Mathematica, Wolfram brought forth the the 1,200+ page tome that is NKS.

101440

Wolfram is no lightweight; we’re talking about one of those few people in the world with the intellectual capacity and training to do modern quantum mechanics at age 17—when it comes to mathematics he is the real McCoy.  But, very few seriously look at NKS as being fundamentally new, and no few academics treat Wolfram with vitriol and ire due to his claim of novelty.  We all—it is universally agreed—stand upon the shoulders of giants.  Further, many argue that NKS contributes nothing new to the existing corpus of mathematics or science.  Here are some notes for the dramaturg:

  • The NKS book was self-published by Wolfram-Media
  • Wolfram opens the book with a statement of his childhood dream to “know everything”
  • NKS is the culmination of twenty years of work outside of academia
  • The precipitating event that put him on that path was his not being able to understand the pattern generated by a cellular automata (simple computer program) he’d written

My personal belief is that the cognitive dissonance created by having not contributed to science in any recognized, significant way over twenty years, in the face of soaring hubris born out of his prodigious intellect and early summiting of some of the highest peaks of academia, forces Dr. Wolfram to eschew—even denigrate—the trappings of mainstream academia and embrace his self-created role of the father of a new kind of science.  But, hey, I don’t know the guy; and it goes without saying that I’m criticizing Michael Jordan’s cross-over.  Maybe it will take twenty years for everyone else to get it; if it is new, it's nascent, and you have to walk before you can crawl.

He may be an egoist, but he has excellent taste.  I think he’d make Edward Tufte proud.

What WolframAlpha owes its life to is the attempt to make NKS immediately relevant in a roundabout way.  Remember, everything—everything—is simply a computation.  Another concept that is critical to understanding WolframAlpha from NKS is “computational equivalence”; essentially this is the reason why NKS doesn’t permit predicting the future: to do that you would need a computer the equivalent of our universe.  Reality, you see, is simply the current result of the continuous computation being performed by the fabric of our universe.  (As strange as that may sound, it is not dissimilar from the beliefs of proponents of the much vaunted string theories of the universe.)

Since we don’t have a computer the size of the world, we can’t calculate everything in reality.  What we can do, though, is examine the current results of the computation, in the form of data that can be collected and mined and extruded by domain experts into an symbolic ontology that is computable by Mathematica.  We can then run computations with these synthetic ontologies, because they can be related via formal ontology of units we have created throughout human history.  To what end?  Well, I suppose that depends on the answers you get.

I hope by now you understand why it is called WolframAlpha.  And, for those lovers of the theater who haven’t yet guessed, here is the cast of characters.

Hamlet
WolframAlpha
Fortinbras
Google Squared
Gertrude
Capitalism
Ophelia
Wikipedia
Claudius
Academia

I think it is obvious who plays the role of the murdered king. Ok, I've surely stretched the metaphor a bit too thin, but I really like the notion that these kinds of current events are manifestations of the same archetypes described in our great literature. I think Wolfram would like it too, but I have no doubt his protagonist wouldn't be from one of Shakespeare's tragedies.

Monday, May 11, 2009

Microsoft TechEd: Day 0

It’s the evening before the official start of the Microsoft’s annual IT conference, TechEd.  In 2009 we find ourselves in sunny, yet cool, Los Angeles.  I couldn’t be happier with the weather and the hotel, but so far the conference sucks.

Why—or, more appropriately, how can I say it sucks when the conference hasn’t begun?  Easy, go to msteched.com and attempt to build a schedule.  Well, you have to be signed in with Windows Live and a registered attendee I believe, but if you could and did you would quickly discover a rather pathetic fail on the part of the conference planners.  The session builder is garbage.  You’ll spend the majority time watching this animation, since they’ve not seemed to master the partial page update without updating the entire 670K+ page weight.  Seriously, I’d rather look at the source and see and XML Data Island than wait for the entire page to reload every time I updated my schedule.

image

Aesthetically, it isn’t bad, the site I mean, but the session builder is garbage.  The session builder for Web 2.0 Expo at least displayed all the tracks and timeslots in a calendar-like fashion, so you could make your decision by sight and rather quickly.

image

But, you know, if the content is there, if the sessions are just awesome, who cares about some clunky, hacked-together conference website?  Don’t judge a book by its cover, right?  So, what are my options for tomorrow’s 11am timeslot?  Well, the first keynote is this one.

KEY01 Moving Forward Together: The Potential of IT Innovation
Presenter: Bill Veghte
Mon 5/11 | 10:00 AM-11:30 AM | West Hall A

In today's economic environment, your company is faced with unprecedented pressures to reduce costs and drive operating efficiencies. At the same time, the demands on IT to deliver more connected services and greater flexibility continue to grow. With the upcoming release of Windows 7, Windows Server 2008 R2, and Exchange "14," Microsoft helps IT organizations better meet these competing demands. Join Bill Veghte, Senior Vice President of Windows Business, as we explore how these technologies combine to give you the tools and resources to better manage your infrastructure--from back-end services through client-side experiences, and the network in between.
Doesn’t exactly inspire.  But, never fear, there is an alternative!
PAN59 Agile: A Process or an Excuse?
Presenters: Richard Campbell, Stephen Forte, Chris Menegay, Joel Semeniuk
Mon 5/11 | 11:00 AM-12:00 PM | 501C

Over the last few years, the community at large has created a number of agile development styles; Scrum, XP, and more are there for you and your team to choose. However, is agile really a process? On the other hand, is it an excuse to avoid accountability and proper development techniques? Come to this interactive session to see what Chris, Steve, and Joel have to say, and, if you are up for it, share your opinion.

Seriously, I’m not making this up.  I came to Los Angeles for this?

Thursday, April 9, 2009

The Coming Apocalypse of your "Friends"

What's a friend on MySpace, Facebook, and Orkut?  If you have five hundred friends, are they really friends?  These are the main reasons why I have not, to this point, joined any of those sites.  I don't have many friends, because I just can't give a lot of people the energy needed to maintain a true, lasting friendship.

What does this have to do with technology?  Well, Facebook Connect and Google FriendConnect are two apis, with many more to come, that allow you to take your social graph with you.  So now, instead of just being a bunch of people that post something they copy and pasted onto your "wall" at major holidays, they are now people who are going to affect recommendations you get on other websites.

Tell me, do you want five hundred of your "closest" friends opinions, tastes, and web surfing habits to have a direct effect on your web experience?

Friday, April 3, 2009

Web 2.0 Expo Day 2

This morning started with keynotes, though there didn't seem to be a theme beyond "here is who we could get to come and talk." The sole exception was the very first speaker, Douglas Rushkoff, was fantastic.  I'm really excited about his new book, Life Inc.  His main theses were that corporations were not created to allow competition and artificial currency was not

Nokia's CEO showed us his vision for ten years from now.  Wearable computers?  Wow, now that's innovative.  He got off on some weird metaphysical rant about projecting our consciousness into the 'net.  What he really means is a constant data stream of my location, mood, and random thoughts.  The only interesting thing he mentioned was geolocation through photography.  I really feel like that is an idea whose time has come.

Next I attended a session that discussed the relative merits of building directly for the iPhone or the web, and the conclusion was use PhoneGap to do both.  it had some interesting points in making the case to stay web-based, but none of them were very convincing.  He attacked the following five arguments for going native:

  • Performance--the recent "fast javascript" movement combined with paring down the page bloat that has occurred in recent years
  • Offline Usage--there is HTML 5 (and thus embedded mySQL support) in iPhone since v2.0
  • Findability--there are now 25K+ apps in the AppStore
  • Device Capabilities (GPS, camera, accelerometer, etc.)--the GPS capabilities are "opening up", and PhoneGap let's you go native
  • Monetization--4% of devices get turned down, people use iPhone apps like tissue paper, you have to sell your app for $0.99 to get any traction

Some valid points there, but it's ironic that the solution proffered is to start web-based and enhance to native.

Microsoft Bizspark were the underwriters for Launch Pad at Web 2.0 Expo.  There were five companies who presented:

Honestly, I wish I had gone to a different session.  I already knew about PhoneGap from the ten other times it was mentioned this week.  What they are doing at 80legs is really cool, but I'm not sure about its applicability.  They've made something that is really hard to do really cheap, but they've not made it easy.  That is, you still have to write Java code.  I'm more excited about what people are doing with it, a sure sign of a platform play.  I hope they get acquired and rewarded for their hard work.  One last thing, there was maybe a hundred people in the room.  Maybe.

I went to a talk from the guys at EngineYard.  Pretty good talk, but really centered on the implications of systems management in the could-era.  I really enjoyed the talk, pretty fun.  The gave it in this paired presenter style where one would pick up where the other left off.  Nothing terrible prescient or relevant to me, but they host a lot of really awesome new companies, like github.

Wednesday, April 1, 2009

Werner Vogels

Super cool guy.  He's one of those super geeks that's past the point where he feels it necessary to get dragged into details but is totally upbeat.  It's good to be the king, right?  Perhaps my biggest surprise was to learn just how much Amazon views itself as a technology company.  Amazon's retail experience is just a customer of AWS.  Riiiight.  I find that hard to believe, but he claims that Amazon.com the retail site is not even the biggest customer of AWS.  Whoa.

My big takeaway from his talk and comments afterward is that only companies that really, really must scale to incredible extremes should build their applications for AWS, i.e. bake in use of SimpleDB, SQS, etc.  What he said to me was, "run your Linux infrastructure on EC2 so you can leave if you don't like us."  So, really, build great SOA systems and use AWS as a virtualization provider.  He had a lot of great principles in his talk for building scalable systems, including:

Autonomy
individual components make decisions only based on local information
Asynchrony
Make progress under all circumstances
Controlled Concurrency
operations are designed such that limmited or no concurrency control is required
Controlled Parallelism
use fully decentralized (p2p) techniques to remove bottlenecks
Decentralize
Remove dependencies
Decompose
Break up into more granular services
Failure Tolerant
Failure is a normal part of operation
Local Responsibility for Consistency
Similar to Autonomy, don't look elsewhere for consistent state
Simplicity
pare away functionality until you cannot
Symmetry
all nodes can do any function in the system

Some other interesting bits were that the Amazon Homepage is produced by 200-300 services. Every service at Amazon is developed and operated by the same team. There are about 500-600 services/teams. Elasticity is both shrinking AND growing on demand.

Geolocation: The new W3C API

A thought leader in geolocation from the company that created Loki gave a talk about geolocation on the web.  His first recommendation was to build your applications to adopt latitude and longitude as the basic unit of location.  He then went on to talk about all the other ways location can be represented.  For example, the names of business, location descriptions, planar areas (rather than points) on a map, etc.  These are generally much more welcome in the UX than map coordinates.

Some ways of getting local businesses are:

  • Yahoo! Local Search API
  • Yelp
  • Localeze

The skinny on geolocation for the web right now is that it relies on browser plug-ins or extensions.  The nice thing is that a W3C working group put together a standard for accessing the geolocation capabilities of the browser from Javascript.  It's quite easy to use, if not widely available.

One interesting stat is the precision one can expect from the various means of geolocation.

  • GPS can be accurate to within 10m for commercial applications
  • Wifi (database lookup and triangulation from SSIDs and signal strength) can be as good as 20m up to 80m
  • Cell towers are up around 1000m
  • IP address based schemes are wildly inaccurate and limited to city/region precision

Loki.com/how and Google Gears utilize hybrid approaches to do geolocation, obviously, to balance speed of acquisition, device capability, desire precision, etc.

Web Developer Tools from Web 2.0

This was by far the best presentation I saw today.  A couple of Mozilla guys gave this great talk really about the past, present, and future of the web developer experience.  I'm really buzzed about the near future of HTML 5, "fast javascript", and the "web worker" model, enabling the kind of interactivity available on desktop applications.

The introduced a lot of tools, mostly around Firebug and its extensions.  I wasn't familiar with the unit testing stuff in that space: Code Coverage and FireUnit.  Some of the more interesting tools were the CSS tools: YUI CSS Grid Builder, blueprint, and the recently announced Microsoft SuperPreview.

They also mentioned some "future" tools like Bespin and Thunderhead.  Bespin is wonderful, a source code editor on the web, with concurrent-versioning system support (git now and later SVN?).  I really feel like this is boostrapping the web, you can begin to think of creating the web ON the web.  Avi Bryant has already plugged Seaside into Bespin.  Thunderhead is a prototype for marrying grid-layouts with the HTML 5 canvas.  They are talking about doing an "Open Tools Directory".

One thing they mentioned that was only tagentially related to their talk was PhoneGap.  This let's HTML/Javascript developers create native mobile phone applications (iPhone, Android, and Blackberrry are supported to different extents at this time).  Atlas, which let's you write native Cocoa (Mac OS) applications in Javascript (their Objective J) was given an excited nod, too.

Why Scala?

I'm trying to catch up with blogging about the sessions I've attended this first day of the Web 2.0 Expo.  The second session I attended today was entitled, "Why Scala?" Presented by a fellow from Twitter, it was essentially a discussion on their adoption of Scala from their initial appraisal of the technology landscape through the pilot project successes to their company-wide embrace of Scala.

Despite not having any major investment in Java code, the JVM platform gave them access to the myriad Java libraries already out there, as well as the ability to do things like hydrate JSON as a JVM object and reason about it naturally within Scala.  Given that they were looking for a language with a great concurrency story, this appears to be the number one reason for choosing Scala, that it is a JVM language written by people who really understand the JVM internals.  Some of the other reasons put forth were:

  • Actor model--like Erlang the primary programming paradigm of Scala is independent workers that share no state and pass messages
  • Immutability--with built-in immutable data structures and the Actor model, Scala heavily encourages immutability, but doesn't force it, meaning you can take advantage of mutability when it is more expressive or performant
  • Type inference--yay, strong typing AND pretty code
  • First-class functions
  • Traits--not real clear on what this is, I think the Erlang/OTP equivalent would be behaviors
  • Pattern matching--this is similar to what Erlang does, but isn't limited to tuples or other terms, i.e. you can match objects
  • XML Literals & Query methods baked into the language
  • Case Classes--these sound like structs, maybe message types?

The "bad" thing about Scala is that it forces you to learn a bit more about the JVM than perhaps you were inclined to learn, and it is very complex.  There seems to be quite a community evolving around Scala, and the mainstreaam web framework is called Lift.

Identity Wars and Joseph Smarr

I remember Mr. Smarr's talk about OpenID and OAuth from a couple of years ago.  I got the impression that his biggest fear was that Facebook would always be a walled garden.  For reasons I don't quite understand, perhaps a sort of social consciousness (no pun intended), Facebook has opened their platform up, albeit in a limited and proprietary way.

Fortunately, there are companies like Google and JanRain who are hiding the differences between the Open Stack and Facebook's APIs.  First, let's have a look at the Open Stack proper.

JanRain has RPX, and Google has FriendConnect, so using the OpenID/OAuth stuff is easy without really knowing a lot.  What is needed is a way to marry the existing authentication stacks of all the common web platforms (ASP.NET, PHP, etc.) with the Open Stack, I think.

For more information, here are some blogs:

  • TheSocialWeb.tv
  • JosephSmarr.com
  • TheRealMcCrea.com

SEO: From Soup to Nuts--Web 2.0 Expo Workshop

It's workshop day in the first and only pre-conference day of Web 2.0 Expo San Francisco.  Among the first sessions of the day, I opted to attend the presentation given by Stephan Spencer of Netconcepts: "SEO: From Soup to Nuts."

His presentation was organized around his "three pillars of SEO": Content, Architecture, and Links.  His SEO practice is very data driven as evidenced by his egregious use of statistics.  Among the highlights of those statistics:

  • 1.25 to 1.5 times more conversions to PPC versus "natural" (normal search result click-through)
  • 86% of search engine clicks are natural
  • 47% click-through on first Google result falls to 10% on second, and half of the first natural for the first paid
  • Double click through rate for short URLs according to Marketing Sherpa

It definitely isn't all about statistics in the SEO game, though.  In fact, it seemed as if getting hard data about efficacy was the biggest and enduring challenge.  There are some hard-and-fast techniques to employ, thankfully.  Page titles, using 301s, nofollow links, canonical tags, there should be only one url for any given content, etc., as well as some great tools (both free and pay) including:

  • SEOmoz.org's LinkScape
  • SEObrowser.com
  • Google Web Optimizer
  • SIFR
  • Adobe Search Engine SDK
  • Yahoo! Site Explorer

There were also a lot of surprises, at least to me:

  • Search result rankings can vary from city to city (of searcher)
  • Underscores do not count as word separators to Google
  • YouTube doesn't give link juice (nofollow)
  • Digg users do not buy anything
  • User generated content can get you so-called "long tail terms"

eyetrackingseoOne thing I wondered, looking at this image that shows the distribution of eye movement on the Google results page, is what are people actually looking for one Google.  They aren't looking for knowledge, but information, a gateway to knowledge, perhaps.