Saturday, June 27, 2009

What, Not How & Why, Not When

It occurred to me this morning that many software development principles seem to emerge from the rigorous application of the following principle:

Your architecture and code should make What & Why explicit without specifying How & When.
What, Not How

It is well known that we should prefer declarative over imperative code. By observing the Law of Demeter, we are in fact forced to create declarative, What, semantics in our interfaces since we can't get at the How, the imperative constructs. The Single-Responsibility Principle tends to force us to factor out the How into other classes, leaving us to consume other classes with What semantics. Further, the Interface Segregation Principle requires us to explicity segregate the interfaces our clients consume based on What they are trying to accomplish; if our focus was How, such segregation would be less of a concern.

Event-Driven Architectures (EDA) are another example of What, Not How. Whereas the SOLID principles operate at the class design level of abstraction, EDA is concerned with system-level architecture. In an Event-Drive Architecture, we explicitly model happenings in the domain. Rather than coupling the site of the happening to a specific party designating for dealing with that happening, we create an Event and use reliabile messaging and subscription schemes to allow one or many services to handle it. In other words, instead of specifying How to deal with a happening at the site that generates it, we explicitly model What happened and let other parties worry about How to deal with it.

Why, Not When

This maxim is both more subtle and more prosaic than What, Not How. It is probably pretty obvious that when given a requirement stated, "if the purchase order acknowledgement is not received within two hours, notify someone in fulfillment," we should model an Event "POAckIsLate" as opposed to "TwoHoursHaveElapsedWithoutReceivingPOAck". We will have different SLAs with different vendors; those SLAs will change, etc. So we can say, when modeling Events in our domain, we should prefer specifying Why, Not When.

Perhaps more subtle is the implications for communication semantics between modules. If we model our communications with Why in mind, we don't get mired in the concurrency problems of specifying When. Consider a workflow. If we specify When to go to a particular step, the underlying reason may have changed unless we take some sort of explicit lock on shared state. If we instead specify Why a particular state transition takes place, we can avoid inconsistent states through continuous evaluation. If we make Why explicit and consequently create semantics to evaluate Why independently of "current" state, it becomes possible to evaluate the state consistently without any shared state, i.e. without a notion of When.

As an example, if we had the requirement, "when a PO is received with a quantity for part X above a twenty units, move the order to the top of the work queue," we should model a "BulkProductionRequest" Event and an "ExpediteProduction"; we should not implement a "Reprioritize Production Queue For Order of PartX Over Twenty Units". Begin with the end in mind and ask What do we want to do (expedite production) not How (re-prioritize production queue). Ask Why are we expediting this order? Because it is Bulk. What is Bulk? Bulk is a quality determined by a CapacityPlanning service and implies that the quantity exceeds some production capacity threshold.

Friday, June 26, 2009

Pareto, Zipf, Heap: The 80-20 Rule, Language, and Diminishing Returns

Please consider this a sort of layman’s disambiguation page.

The classic “80-20 rule” refers to a Pareto distribution.  The thin distribution of the 20% is the subject of Chris Anderson’s “The Long Tail”.  Originally, the Pareto distribution referred to the fact that 20% of the people control 80% of the wealth, but it has turned up in many other contexts.

A Zipf’s Law is about rankings and frequency.  The second item’s frequency will be half of the first; the third’s frequency will be half the second, and so on.  The “half” may be some other factor, but it remains constant in the distribution; the rank is inversely proportional to the frequency. The item in Zipf’s Law is a word and its frequency is its appearance in a corpus of English text.  However, Zipf’s Law holds for texts generated from a fixed alphabet by picking letters randomly with a uniform distribution.

Heaps’ Law is about diminishing returns.  It has an exact formula, but generally it says that the more you look into a text, the fewer new discoveries of words you’ll find.  So, as you read through the text it takes longer and longer to find new words in the text.  Heaps’ Law applied to the general case where the “words” are just classifiers of some collection of things.  So, it could be applied to the nationality of a collection of people; you’d have to gather more and more people from a random sampling to get a representative from all countries.

The implications of these laws in various contexts are the subject of much interesting study and postulation.

Tuesday, June 16, 2009

Value Objects and Value Domains

I’ve written relatively extensively on the topic of replacing enum constructs with classes, so I won’t rehash the topic. Instead, I’d like to introduce you to some code I’ve written that enables you to create finite domains of Value Objects in C#.  Please see my two previous posts on the subject to learn more about the benefits of this approach.  To see how the Value Object semantics and making a finite domain a first class concept in the pattern improves the approach, read on.

First, we need some definitions.  We are taking the concept of a Value Object from Eric Evans’ book “Domain Driven Design” p.99:

When you care only about the attributes of an element of the model, classify it as a Value Object. Make it express the meaning of the attributes it conveys and give it related functionality. Treat the Value Object as immutable.  Don’t give it any identity […] The attributes that make up a Value Object should form a conceptual whole.

An example here is useful.  Consider modeling the domain of a book publisher.  At some point you need to capture all the different formats of books: A4, B, Pinched Crown, etc.  The attributes of width, height, unit of measure, and name would go into your model.  But, any instance of your A4 Value Object should be completely interchangeable from any other A4.  And, it goes without saying that you can’t change the width or height or any other attribute of A4.

All of the different formats of books belong to a Value Domain. According to WordNet, a domain (we are using the mathematical notion) is:

the set of values of the independent variable for which a function is defined

The functional complement of a domain is the range of the function; i.e. domain is what you can put in, the range is what you can expect to get out.  I like calling this concept in my approach a domain instead of a set, because it neatly captures a key benefit.  When we are writing code.  We want to declare our parameters as being a proper member of domain of values, instead of just primitive types or strings.

Now we’re ready to dive into the implementation.  Let’s begin with the end in mind.  I want to write a function, say, that let’s me search for a document with a particular format about an arbitrary topic.

void SearchDocs(DocFormat docFormat, string topic)


Ok, so we could create a base class or an interface called DocFormat and create Word doc, PDF, etc. that inherit from or implement DocFormat.  Easy.  But, SearchDocs has to be able to handle all current and future implementations of DocFormat; it must not violate the Liskov substitution principle. What if the repository or search algorithm depends on the what the doc format actually is?  Also, we’d have a subclass of DocFormat to write for every document type, and we’d have to do a lot of work to remove object identity, since your instance of PDF is not the same as my instance of PDF.  And, don’t forget to make the whole thing immutable. [Note: I know this is a contrived example with well-established OOP solutions that don’t require LSP violation.  Work with me here. :)]



Clearly we have a lot of work to do to make our Value Objects a reality.  It’s not impossible, though, and a quick Bing Google turns up a couple of investigations and approaches.  Jimmy Bogard’s approach gave me a clue that I needed to get it working.  What I wanted was a base type, ValueObject, that I could inherit from and get the Value Object semantics described in Evans.



Jimmy’s approach used a self-referencing open constructed type to allow the ValueObject base class to do all the work of providing Value Object semantics.  This base class uses dynamic reflection to determine what properties to use in the derived class to do equality comparisons (an approach nominally improved upon here). He saw his approach as having a single fundamental flaw; it only worked for the first level of inheritance, i.e. the first descendent from ValueObject.



For my purposes—creating a bounded, finite domain of Value Objects to replace enum-type classes--this is not a flaw.  Substantively, all that remains to do is introduce the concept of the the Value Domain into Jimmy’s approach and put the Value Objects in it.  Because I wanted to use these Value Domains throughout the enterprise, I baked WCF support into my approach.  Further, because the Value Domain is defined a priori, I didn’t have to play with object identity; I could simply look the value up in the domain.  (It took a little out-of-the-box thinking to get that to work transparently.)  Finally, I wanted it to be trivially easy to create these Value Domains, so I created a snippet for use in Visual Studio.



Here’s an example of Value Domain and Value Objects implementation:



[DataContract]
public sealed class DocFormats : ValueObject<DocFormats.DocFormat, string, DocFormats>.Values<DocFormats>
{
private DocFormats()
{
Word = Add("Word");
PDF = Add("PDF");
}

[DataMember]
public readonly BindingType Word, PDF;

[Serializable]
public sealed class DocFormat : ValueObject<DocFormat, string, DocFormats>
{
private DocFormat(string v) : base(v) { }
}
}



DocFormats is the Value Domain.  DocFormat is the type of the Value Object.  String is the underlying type of the values, but the pattern supports anything that implements IEquatable and IComparable.  Methods can accept DocFormats.DocFormat as a parameter and only Word and PDF will be valid.  Code can specify those values through a Singleton accessor: DocFormats.Instance.PDF.  You’ll notice that the only constructor is private; the Singleton implementation is in the base value domain base class (ValueObject<…>.Values<…>).



// our new method interface
void SearchDocs(DocFormats.DocFormat docFormat, string topic)
{
// referencing an individual value
if(DocFormats.Instance.Word == docFormat)
{
//...
}
}



Above you’ll note that the Value Object type definition (subclass of ValueObject<…>) is nested inside of the Value Domain.  Doing that groups the two together syntactically in a very natural way (FooTypes.FooType); the entire domain of values is contained in one place.  That locality makes the snippet to produce them cohesive too.  [Note: I’ve included the snippet XML at the end of this post.]



Interestingly, the underlying implementation necessarily inverts this nesting; the Value Domain base class is nested in the Value Object base class.  That allows the Value Domain Singleton to create instances of Value Objects.  I only had to resort to reflection in order to allow the Value Domain base class to provide the Singleton implementation.  Inversion of Control containers like Unity and Castle Windsor do this kind of reflection all the time, and it’s cheap since .NET 2.0. 



Without further ado, here’s the base classes implementation.



using System;
using System.Collections.Generic;
using System.Reflection;

[Serializable]
public abstract class ValueObject<T, TValue, TValues> : IEquatable<T>, IComparable, IComparable<T>
where T : ValueObject<T, TValue, TValues>
where TValue : IEquatable<TValue>, IComparable<TValue>
where TValues : ValueObject<T, TValue, TValues>.Values<TValues>
{
/// <summary>
///
This is the encapsulated value.
/// </summary>
public readonly TValue Value;
protected ValueObject(TValue value)
{
Value = value;
}

#region equality
public override bool Equals(object other)
{
return other != null && other is T && Equals(other as T);
}

public override int GetHashCode()
{
// TODO provide an efficient implementation
// http://www.dotnetjunkies.com/weblog/tim.weaver/archive/2005/04/04/62285.aspx
return Value.GetHashCode();
}

public bool Equals(T other)
{
return other != null && Value.Equals(other.Value);
}

public static bool operator ==(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
// pointing to same heap location
if (ReferenceEquals(x, y)) return true;

// both references are null
if (null == (object)(x ?? y)) return true;

// auto-boxed LHS is not null
if ((object)x != null)
return x.Equals(y);

return false;
}

public static bool operator !=(ValueObject<T, TValue, TValues> x, ValueObject<T, TValue, TValues> y)
{
return !(x == y);
}
#endregion

public static implicit operator
TValue(ValueObject<T, TValue, TValues> obj)
{
return obj.Value;
}

public static implicit operator ValueObject<T, TValue, TValues>(TValue val)
{
T valueObject;
if (Values<TValues>.Instance.TryGetValue(val, out valueObject))
return valueObject;

throw new InvalidCastException(String.Format("{0} cannot be converted", val));
}

public override string ToString()
{
return Value.ToString();
}

#region comparison

public int CompareTo(T other)
{
return Value.CompareTo(other);
}

public int CompareTo(object obj)
{
if (null == obj)
throw new ArgumentNullException();

if (obj is T)
return Value.CompareTo((obj as T).Value);

throw new ArgumentException(String.Format("Must be of type {0}", typeof(T)));
}

#endregion

[Serializable]
public abstract class Values<TDomain> where TDomain : Values<TDomain>
{
[NonSerialized]
protected readonly Dictionary<TValue, T> _values = new Dictionary<TValue, T>();

private void Add(T valueObject)
{
_values.Add(valueObject.Value, valueObject);
}

protected T Add(TValue val)
{
var valueObject = typeof(T).InvokeMember(typeof(T).Name,
BindingFlags.CreateInstance | BindingFlags.Instance |
BindingFlags.NonPublic, null, null, new object[] { val }) as T;
Add(valueObject);
return valueObject;
}

public bool TryGetValue(TValue value, out T valueObject)
{
return _values.TryGetValue(value, out valueObject);
}

public bool Contains(T valueObject)
{
return _values.ContainsValue(valueObject);
}

static volatile TDomain _instance;
static readonly object Lock = new object();
public static TDomain Instance
{
get
{
if (_instance == null)
lock (Lock)
{
if (_instance == null)
_instance = typeof(TDomain)
.InvokeMember(typeof(TDomain).Name,
BindingFlags.CreateInstance |
BindingFlags.Instance |
BindingFlags.NonPublic, null, null, null) as TDomain;
}
return _instance;
}
}
}
}



I’d love to hear your feedback.  This is approach is not intended to support unbounded Value Domains, such as the canonical example of Address.  It is meant for the enums-as-classes problem, i.e. finite, bounded Value Domains.



ValueObject.snippet



<?xml version="1.0" encoding="utf-8" ?>
<
CodeSnippets xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
<
CodeSnippet Format="1.0.0">
<
Header>
<
Title>ValueOjbect</Title>
<
Shortcut>vo</Shortcut>
<
Description>Code snippet for ValueObject and domain</Description>
<
Author>Christopher Atkins</Author>
<
SnippetTypes>
<
SnippetType>Expansion</SnippetType>
</
SnippetTypes>
</
Header>
<
Snippet>
<
Declarations>
<
Literal>
<
ID>domain</ID>
<
ToolTip>Value domain name</ToolTip>
<
Default>MyValues</Default>
</
Literal>
<
Literal>
<
ID>value</ID>
<
ToolTip>ValueObject Class name</ToolTip>
<
Default>MyValueObject</Default>
</
Literal>
<
Literal>
<
ID>type</ID>
<
ToolTip>Value Type</ToolTip>
<
Default>string</Default>
</
Literal>
</
Declarations>
<
Code Language="csharp">
<![CDATA[
[DataContract]
public sealed class $domain$ : ValueObject<$domain$.$value$, $type$, $domain$>.Values<$domain$>
{
private $domain$()
{
Value1 = Add(String.Empty);
}

[DataMember]
public readonly $value$ Value1;

[Serializable]
public sealed class $value$ : ValueObject<$value$, $type$, $domain$>
{
private $value$($type$ v) : base(v) { }
}
}
]]>
</
Code>
</
Snippet>
</
CodeSnippet>
</
CodeSnippets>