Showing posts with label architecture. Show all posts
Showing posts with label architecture. Show all posts

Tuesday, July 20, 2010

Using the Web to Create the Web

Wikis do this, as do blogs.

Fast JavaScript in browsers is enabling a new generation of programmers to develop applications completely in their browser.  While the most obvious commercial example is Force.com, there are many other ideas out there.  In no particular order:

The common connection between these frameworks is the notion of bootstrapping the web; that is, using the web to create the web.

If you’ll forgive the inchoate thoughts, let me attempt to connect some mental dots.

Dr. Alan Kay has of late been discussing the SmallTalk architecture of real objects (computers) all the way down and how this might improve the nature of software on the Internet.

In September 2009 in an interview, Dr. Kay said,

The ARPA/PARC research community tried to do as many things ‘no center’ as possible and this included Internet […] and the Smalltalk system which was ‘objects all the way down’ and used no OS at all. This could be done much better these days, but very few people are interested in it (we are). We’ve got some nice things to show not quite half way through our project. Lots more can be said on this subject.

This month in an interview with ComputerWorld Australia, Dr. Kay expounded,

To me, one of the nice things about the semantics of real objects is that they are “real computers all the way down (RCATWD)” – this always retains the full ability to represent anything. The old way quickly gets to two things that aren’t computers – data and procedures – and all of a sudden the ability to defer optimizations and particular decisions in favour of behaviours has been lost.

In other words, always having real objects always retains the ability to simulate anything you want, and to send it around the planet. If you send data 1000 miles you have to send a manual and/or a programmer to make use of it. If you send the needed programs that can deal with the data, then you are sending an object (even if the design is poor).

And RCATWD also provides perfect protection in both directions. We can see this in the hardware model of the Internet (possibly the only real object-oriented system in working order).

You get language extensibility almost for free by simply agreeing on conventions for the message forms.

My thought in the 70s was that the Internet we were all working on alongside personal computing was a really good scalable design, and that we should make a virtual internet of virtual machines that could be cached by the hardware machines. It’s really too bad that this didn’t happen.

Is OOP the wrong path? What is this RCATWD concept really about?  Doesn’t the stateless communication constraint of REST force us to think of web applications in the browser as true peers of server applications?  Should we store our stateful browser-based JavaScript applications in a cloud object-database, in keeping with the Code-On-Demand constraint of REST?  Can we make a them “real objects” per Dr. Kay?  Are RESTful server applications just functional programs?  If so, shouldn’t we be writing them in functional languages?

I definitely believe we can gain many benefits from adopting a more message-passing oriented programming style.  I would go so far as to say that OO classes should only export functions, never methods.  (They can use methods privately of course, to keep things DRY.)

I’ve written extensively in a never published paper about related topics: single-page applications, not writing new applications to build and deliver applications for every web site, intent-driven design, event sourcing, and others.  Hopefully I’ll find the time to return to that effort and incorporate some of this thinking.

RavenDB: In the Code, Part 1—MEF

If you’ve not heard of RavenDB, it’s essentially a .NET-from-the-ground-up document database taking its design cues from CouchDB (and MongoDB to a lesser degree). Rather than go into the details about its design and motivations, I’ll let Ayende speak for himself.

Instead, I would like to document some of the great things I’ve found in the codebase of RavenDB, as I read to be a better developer.  This series of articles discusses RavenDBs use of the following .NET 4 features.

  • Managed Extensibility Framework (MEF)
  • New Concurrency Primitives in .NET 4.0
  • The new dynamic keyword in C# 4

While discussing RavenDB’s use of these features, I hope to provide a gentle introduction to these technologies.  In this, the first post of the series, we discuss MEF.  For a very brief introduction to MEF and its core concepts, see the Overview in the wiki.

Managed Extensibility Framework

MEF was originally in the Patterns & Practices team and has since moved into the BCL as the System.ComponentModel.Composition namespace.  Glenn Block has nominated it as a plug-in framework, an application partitioning framework, and has given many reasons why you may not want to attempt to use it as your inversion-of-control container (especially if you listen to Uncle Bob’s advice). RavenDB uses MEF to handle extensibility for it’s RequestResponder classes.

RavenDB’s communication architecture is essentially an HTTP server that has a number of registered handlers of requests, not unlike the front-controller model of ASP.NET MVC.  Akin to MVC’s Routes, each RequestResponder provides a UrlPattern and SupportedVerbs to identify those requests it will handle. A given RequestResponder will vary it’s work depending on the HTTP verbs, headers, and body of the request.  It is in this sense that RavenDB can be considered RESTful (even if it isn’t, see street REST).

Code Snippet
  1. public class HttpServer : IDisposable
  2.     {
  3.         [ImportMany]
  4.         public IEnumerable<RequestResponder> RequestResponders { get; set; }

This HttpServer class dispatches requests to one of the items in the RequestResponders. This is populated by MEF because of the ImportManyAttribute.    MEF looks in its catalogs and finds the RequestResponder class is exported, as is all of it’s subclasses; see below.

Code Snippet
  1. [InheritedExport]
  2. public abstract class RequestResponder

The InheritedExportAttribute ensures that MEF considers all subclasses of the attributed class are themselves as exports.  So, if your class inherits from RequestResponder and MEF can see your class, it will automatically be considered for each incoming request.

How does MEF “see your class”? Out-of-the-box MEF provides for the definition of what is discoverable in a number of useful ways. RavenDB makes use of these by providing it’s own MEF CompositionContainer.

Code Snippet
  1. public HttpServer(RavenConfiguration configuration, DocumentDatabase database)
  2. {
  3.     Configuration = configuration;
  4.  
  5.     configuration.Container.SatisfyImportsOnce(this);

Above, in the constructor of the HttpServer class, we see the characteristic call to SatisfyImportsOnce on the CompositionContainer. This instructs the container to satisfy all the imports for the HttpServer, namely the RequestResponders.  The configuration.Container property is below:

Code Snippet
  1. public CompositionContainer Container
  2. {
  3.     get { return container ?? (container = new CompositionContainer(Catalog)); }

And the Catalog property is initialized in the configuration class’ constructor like this:

Code Snippet
  1. Catalog = new AggregateCatalog(
  2.     new AssemblyCatalog(typeof (DocumentDatabase).Assembly)
  3.     );

So the container is created with a single AggregateCatalog that can contain multiple catalogs.  That AggregateCatalog is initialized with an AssemblyCatalog which pulls in all the MEF parts (classes with Import and Export attributes) in the assembly containing the DocumentDatabase class (more on that later).

That takes care of the built-in RequestResponders, because those are in the same assembly as the DocumentDatabase class.  If that smells like it violates orthogonality, you are not alone. But, I digress; what about extensibility? How does Raven get MEF to see RequestResponder plugins?

The configuration class also has a PluginsDirectory property; in the setter, is the following code.

Code Snippet
  1. if(Directory.Exists(pluginsDirectory))
  2. {
  3.     Catalog.Catalogs.Add(new DirectoryCatalog(pluginsDirectory));
  4. }

So, in Raven’s configuration you can specify a directory where MEF will look for parts.  That’s the raison d'ĂȘtre of MEF’s DirectoryCatalog, since a plugins folder is such a common deployment/extensibility pattern.  You can learn more about the various MEF catalogs in the CodePlex wiki.

Now, the real extensibility story for RavenDB is its triggers.

RavenDB Triggers

The previously mentioned DocumentDatabase class is responsible for the high-level orchestration of the actual database work.  It maintains four groups of triggers.

Code Snippet
  1. [ImportMany]
  2. public IEnumerable<AbstractPutTrigger> PutTriggers { get; set; }
  3.  
  4. [ImportMany]
  5. public IEnumerable<AbstractDeleteTrigger> DeleteTriggers { get; set; }
  6.  
  7. [ImportMany]
  8. public IEnumerable<AbstractIndexUpdateTrigger> IndexUpdateTriggers { get; set; }
  9.  
  10. [ImportMany]
  11. public IEnumerable<AbstractReadTrigger> ReadTriggers { get; set; }

Following the same pattern as RequestResponders, the DocumentDatabase calls configuration.Container.SatisfyImportsOnce(this). So, the imports are satisfied in the same way, i.e. from DocumentDatabase’s assembly and from a configured plug-ins directory.

In RavenDB triggers are the way to perform some custom action when documents are “put” (i.e. upsert) or read or deleted.  RavenDB triggers also provide a way to block any of these actions from happening.

Raven also allows for custom actions to be performed when the database spins up using the IStartupTask interface.

Startup Tasks

When the DocumentDatabase class is constructed, it executes the following method after initializing itself.

Code Snippet
  1. private void ExecuteStartupTasks()
  2. {
  3.     foreach (var task in Configuration.Container.GetExportedValues<IStartupTask>())
  4.     {
  5.         task.Execute(this);
  6.     }
  7. }

This method highlights the use of the CompositionContainer’s GetExportedValues<T> function, which returns all of the IStartupTasks in the catalogs created in the configuration object.

Conclusion

We’ve seen three important extensibility points in RavenDB supported by MEF: RequestResponders, triggers, and startup tasks.  Next time, we’ll look at two more—view generators and dynamic compilation extensions—while learning more about RavenDB indices.

Monday, May 17, 2010

The Law of Demeter and Command-Query Separation

My first encounter of the Law of Demeter (LoD) was in the Meilir Page-Jones book, Fundamentals of Object-Oriented Design in UML. It is also referenced in Clean Code by Robert C. Martin.  Basically, the law states that methods on an object should only invoke methods on objects in their immediate context: locally created objects, containing instance members, and arguments to the method itself.  This limits what Page-Jones calls the direct encumbrance of a class;  the total set of types a class directly depends upon to do its work.  Martin points out that if an object effectively encapsulates it’s internal state, we should not be able “navigate through it.”  Not to put too fine of a point on it, but the kind of code we are talking about here is:

Code Snippet
  1. static void Main(string[] args)
  2. {
  3.     var a = new A();
  4.     a.Foo().Bar();
  5. }
  6.  
  7. class A
  8. {
  9.     public B Foo()
  10.     {
  11.         // do some work and yield B
  12.     }
  13. }
  14.  
  15. class B
  16. {
  17.     public void Bar()
  18.     {
  19.  
  20.     }
  21. }

Martin calls line 4 above a “train wreck” due to its resemblance to a series of train cars.  Our Main program has a direct encumbrance of types A & B. We “navigate through” A and invoke a method of B.  Whatever Foo() does, it is not effectively encapsulating it; we cannot change it to an implementation that uses C transparently.

LoD is a heuristic that leverages the encumbrance of a type to determine code quality. We observe that effective encapsulation directly constrains encumbrance, so we can say that the Law of Demeter is a partial corollary to an already well known OOP principle: encapsulation.  Another such principle is Command-Query Separation (CQS) as identified by Bertrand Meijer in his work on the Eiffel programming language.

CQS simply states that methods should either be commands or query; they should either mutate state or return state without side-effects.  Queries must be referentially transparent; that is you can replace all query call sites with the value returned by the query without changing the meaning of the program. Commands must perform some action but not yield a value.  Martin illustrates this principle quite succinctly in Clean Code.

Referring to our snippet above again, we can see that if CQS were to have been observed, Foo() would return void and our main program would not have been returned an instance of B.  CQS thus reinforces LoD, both of which manifest as specializations of OO encapsulation.  Following these principles force us to change the semantics of our interfaces, creating contracts that are much more declarative.

CQS has many implications.  Fowler observes that CQS allows consumers of classes to utilize query methods with a sense of “confidence, introducing them anywhere, changing their order.”  In other words, CQS allows for arbitrary composition of queries; this is a very important concept in functional programming.  Queries in CQS are also necessarily idempotent, by definition; this is extremely important in caching.

In Martin’s discussion of LoD, he notes that if B above were simply a data structure—if all of its operations were Queries—we would not have a violation of LoD, in principle.  This is because LoD is really only concerned with the proper abstraction of Commands. From the C2 wiki,

In the discussion on the LawOfDemeter, MichaelFeathers offers the analogy, "If you want your dog to run, do you talk [to] your dog or to each leg? Further, should you be able to manipulate the dog's leg without it knowing about it? What if your dog wants to move its leg and it doesn't know how you left it? You can really confuse your dog."

To extend the analogy, when you are walking your dog, you don’t command it’s legs, you command the dog.  But, if you want to have a smooth walk, you’ll stop and wait if you one of the dog’s legs is raised. It is in this sense that we can restate the Law of Demeter in terms of Command-Query Separation.

  • Whereas a Query of an object must:
    1. never alter the observable state of the object, i.e. the results of any queries;
    2. and return only objects entirely composed of queries, no commands.
  • A Command of an object must:
    1. constrain its actions to be dependent upon the observable state, i.e. Queries, of only those objects in its immediate context (as defined above);
    2. and hide the internal state changes that result of its actions from observers.

This restating is helpful in that it implies that we refactor to CQS before applying LoD. In the first phase we clearly separate commands from queries. In the second phase we alter the semantics of our commands and queries to comply with LoD.  While the first phase is rather mechanical, it give us a good starting point to reconsider the semantics of our objects as we bring them into compliance with LoD.

Tuesday, August 4, 2009

NASA’s Cloud Platform: NEBULA, and Amorphous Morphology

NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies.

They always write it in all caps, though it doesn’t appear to be an acronym. My guess is that it’s easier to search for “NEBULA” and find what you want at NASA. And, what better name for a cloud computing platform?

To be frank, I didn’t recognize any of the technologies other than RabbitMQ and Apache. So I set out to find out what each piece does, and it’s these times that really make me envious of the LAMP folks. There is so much going on in the OSS world!

In fact, it’s that sheer volume of innovation that makes it such a nightmare. Here’s where having NASA or Amazon or Elastra or EngineYard or Rackspace or Google AppEngine discuss their respective cloud infrastructures is so beneficial. If you look at their technology choices as a Venn diagram, you might find an optimal set of technologies in the overlap. Beyond that, you can begin to form a generalized blueprint of a cloud platform, identifying the core components.

cloud provider Venn diagram

In this way the cloud begins to take shape. Were we to sit all the big industry players down and say, define the cloud al a the OSI model, no one would have gotten around to building one. If we find the clouds in the wild, we can dissect them and find out what makes them tick. We can reverse engineer our blueprint.

There’s a few major players missing here, of course, most notably Salesforce.com (SFDC) and Microsoft’s Azure platform.  SFDC doesn’t have an infrastructure that you can replicate, per se, because it is inseparable from their proprietary technologies.  In fact, Stu Charlton and others have noted that this is their biggest failing.  Similarly, Microsoft’s Azure is also a monolithic platform play, with the major difference that the platform can be deployed on premises, a so-called private cloud.

Still though, we can include SFDC and Azure in our analysis.  We can develop our morphology of the cloud platforms that have published their constituent technologies.  With the language of this morphology in hand, we can classify the facilities of other cloud platforms.  Perhaps a taxonomy will evolve that allows us to identify Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service, hybrids, and perhaps some novel species.

Despite the amorphous character implied by the term cloud computing, these platforms have well-defined structure.  Moreover, unlike the traditional use of clouds in architecture diagrams, the details of this structure are important.

Wednesday, July 8, 2009

Three Common Fallacies Concerning REST

Better to light a candle than to curse the darkness. -Chinese Proverb

I purposely did not title this post “The 3 Fallacies of RESTful Computing” as I am certainly not an expert in either REST or fallacy. :)  I am, however, quite well-versed in auto-didacticism, and over the past week I’ve been boning up on REST.  Along the way I’ve had one of my early notions of REST disabused (it is neither an architecture nor a protocol) and noticed a few other common misconceptions in the blogosphere and tweet stream.  If you are new to REST, or even if you aren’t, you might very well find a few edifying points in this post; I hope to light a candle or two out there.

Without further ado, here are three common fallacies concerning REST. 

Fallacy #1 REST is CRUD

Perhaps the most common fallacy of RESTful computing is that REST is simply CRUD (Create, Read, Update, and Delete) over HTTP.  Microsoft’s ADO.NET Data Services endeavors to provide developers a “data service being surfaced to the web as a REST-style resource collection[…]”; this would seem to further the notion that REST is another way of doing CRUD.

However, REST is not CRUD as Stu Charlton states in a blog post about REST design guidelines; Arnon Rotem-Gal-Oz says CRUD is bad for REST, too.  If we are going to attempt to abridge RESTful architecture with an innocuous statement of the form “REST is X over HTTP” let us say that REST is using URLs to facilitate application state changes over HTTP.

Fallacy #2 POST is not RESTful

First, it is very important to note that REST is not tied to a specific protocol.  As an architectural style, it is protocol agnostic; though to be sure HTTP is a natural fit for many reasons.  As Fielding said in It is okay to use POST:

Search my dissertation and you won’t find any mention of CRUD or POST. The only mention of PUT is in regard to HTTP’s lack of write-back caching.  The main reason for my lack of specificity is because the methods defined by HTTP are part of the Web’s architecture definition, not the REST architectural style.

Since the nominal use of POST is orthogonal to RESTfulness, by definition, it cannot be the case that POST is antithetical to REST.  Nevertheless, it is important to understand the reasoning that generally goes into this fallacy, because it speaks directly to a core principle of REST.  Most architectures expose an API that allows consumers to affect the state of the application only indirectly.  To understand the implications of your actions as a consumer, you then have—at best—to be very familiar with the application architecture and be aware that you are making a lot of assumptions.  The state mechanics are hidden from you.  You cannot explicitly move the application from one state to another, nor can you directly observe the transition(s) that have taken place. 

A primary goal of Representational State Transfer is to make the application’s state machine unambiguous by exposing representations of application resources that have embedded within them URIs pertinent to the resource.  In this way a consumer of a RESTful service can discover the current state of the resource, the mechanisms to affect change in the resource, and other resources related to the current resource in the application’s state.  This is what is known as Hypertext as the Engine of Application State (HatEoAS).

HatEoAS implies a model where all of your important resources are represented by unique URIs and all important state changes are done by interacting with representations sent to or retrieved from those URIs. Most people first approaching REST view it in terms of opposing other architectural “styles” such as SOA or get mired in implementation immediately and begin contrasting their understanding of REST over HTTP against WS-*. Another common problem is reducing the ethos of REST to “RPC is bad” and “we don’t need all that complexity, we have HTTP” (see REST is better than WS-*).  These views are commonplace because REST is being promulgated as a better solution for many types of applications on the Web.

The specifics of how REST works over HTTP are beyond the scope of this article, and the subject of a lot of debate, but, since a lot of uses of POST seem very much like RPC-style invocations, people have a knee-jerk reaction that POST is not RESTful.  By now you know this is not the case, but let’s hear from the experts.

From Fielding:

POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT) […]

Stu Charlton’s design guidelines say nearly the same thing:

The problem with POST is when we abuse it by having it perform things that are more expressive in one of the other methods. GET being the obvious one that needs no hypermedia description. For the other methods, a good design guideline is that you MUST not break the general contract of the HTTP method you choose -- but you SHOULD describe the specific intent of that method in hypermedia.

Fallacy #3 REST is better than WS-*

In fact, Fielding’s thesis does address many of the problems and advantages of various network-based architectural styles, but no where does he claim REST is the one ring to rule them all.  In his aforementioned blog post Fielding says (emphasis my own),

[…]there are plenty of information systems that can be designed using the REST architectural style and gain the associated benefits. Managing cloud instances is certainly one of those applications for which REST is a good fit[…]

From his thesis, here is the definition of the REST architectural style.

REST consists of a set of architectural constraints chosen for the properties they induce on candidate architectures. […] [It] is an abstraction of the architectural elements within a distributed hypermedia system. […] It encompasses the fundamental constraints upon components, connectors, and data that define the basis of the Web architecture, and thus the essence of its behavior as a network-based application.

So, the associated benefits are induced by constraints that collectively are referred to as the REST architectural style.  The benefits are myriad, and many of the constraints are are recognizable in how the web works. Section #5 of the thesis is concise and readable, and I encourage the reader to internalize it. 

The salient point here is that REST is not a protocol; it’s not even an architecture.  REST is an architectural style! By understanding its constraints and benefits, we can make informed decisions about its applicability to our problem domain and appropriation of technologies.  Comparing REST to the WS-* suite of protocols is comparing apples to oranges, though there are those who strongly argue the benefits of REST and HTTP over SOAP.

Saturday, June 27, 2009

What, Not How & Why, Not When

It occurred to me this morning that many software development principles seem to emerge from the rigorous application of the following principle:

Your architecture and code should make What & Why explicit without specifying How & When.
What, Not How

It is well known that we should prefer declarative over imperative code. By observing the Law of Demeter, we are in fact forced to create declarative, What, semantics in our interfaces since we can't get at the How, the imperative constructs. The Single-Responsibility Principle tends to force us to factor out the How into other classes, leaving us to consume other classes with What semantics. Further, the Interface Segregation Principle requires us to explicity segregate the interfaces our clients consume based on What they are trying to accomplish; if our focus was How, such segregation would be less of a concern.

Event-Driven Architectures (EDA) are another example of What, Not How. Whereas the SOLID principles operate at the class design level of abstraction, EDA is concerned with system-level architecture. In an Event-Drive Architecture, we explicitly model happenings in the domain. Rather than coupling the site of the happening to a specific party designating for dealing with that happening, we create an Event and use reliabile messaging and subscription schemes to allow one or many services to handle it. In other words, instead of specifying How to deal with a happening at the site that generates it, we explicitly model What happened and let other parties worry about How to deal with it.

Why, Not When

This maxim is both more subtle and more prosaic than What, Not How. It is probably pretty obvious that when given a requirement stated, "if the purchase order acknowledgement is not received within two hours, notify someone in fulfillment," we should model an Event "POAckIsLate" as opposed to "TwoHoursHaveElapsedWithoutReceivingPOAck". We will have different SLAs with different vendors; those SLAs will change, etc. So we can say, when modeling Events in our domain, we should prefer specifying Why, Not When.

Perhaps more subtle is the implications for communication semantics between modules. If we model our communications with Why in mind, we don't get mired in the concurrency problems of specifying When. Consider a workflow. If we specify When to go to a particular step, the underlying reason may have changed unless we take some sort of explicit lock on shared state. If we instead specify Why a particular state transition takes place, we can avoid inconsistent states through continuous evaluation. If we make Why explicit and consequently create semantics to evaluate Why independently of "current" state, it becomes possible to evaluate the state consistently without any shared state, i.e. without a notion of When.

As an example, if we had the requirement, "when a PO is received with a quantity for part X above a twenty units, move the order to the top of the work queue," we should model a "BulkProductionRequest" Event and an "ExpediteProduction"; we should not implement a "Reprioritize Production Queue For Order of PartX Over Twenty Units". Begin with the end in mind and ask What do we want to do (expedite production) not How (re-prioritize production queue). Ask Why are we expediting this order? Because it is Bulk. What is Bulk? Bulk is a quality determined by a CapacityPlanning service and implies that the quantity exceeds some production capacity threshold.

Monday, May 25, 2009

Connascence: The Underlying Principle of OO Design?

This is a great talk at a Ruby conference by Jim Weirich about his attempt to frame all object-oriented design (OOD) principles as special cases of an underlying principle called “connascence”.  Connascence is a term co-opted by Meilir Page-Jones in the ‘90s for use in OOD; below is the definition from his book with Larry Constantine entitled, “Fundamentals of Object-Oriented Design in UML” (page 214):

Connascence between two software elements A and B means either

  1. that you can postulate some change to A that would require B to be changed (or at least carefully checked) in order to perserve overall correctness, or
  2. that you can postulate some change that would require both A and B to be changed together in order to preserve overall correctness.


UPDATE: Jim Weirich's talk can be found here.