Tuesday, November 4, 2008

Dealing with Globalization Differences in Windows Versions

Recently I was charged with leading an effort to localize an ASP.NET application to a few different locales, including The Netherlands and India.  I was reminded again how difficult it is to get this right.  There are those who argue that using the same view across all the cultures supported in your application is a bad idea.  For example, an RTL culture, e.g. Hebrew, when done right, requires a mirroring of the entire view layout: not an easy thing to accomplish without changing the view.

Regardless, the .NET Framework and ASP.NET provide many facilities for localizing your application around a single view: App_LocalResources, App_GlobalResources, CultureInfo, etc.  ASP.NET has a particularly helpful feature; you can set the CultureInfo used by the application in the web.config using the globalization element.

There are lots of different ways of representing a culture in Windows: culture name, culture identifier, locale identifier (LCID), and others.  Generally, though, the culture name is used, e.g. en-US, en-GB, and es-MX for English (US), English (Great Britain), and Spanish (Mexico), respectively.  As the world's second most populous country with many ancient and highly varied cultures, India have no less than nine culture names defined in the .NET Framework.  This is, however, a narrow view of India's diversity with its 28 states, 7 union territories, and literally hundreds of spoken languages.  Obviously, there isn't 100% coverage of this diversity in the cultures supported in .NET.

One particularly useful culture name in use in Windows is en-IN.  In India, the de facto language of the law and by extension government and business is English, due to both the historical influence of British colonization and the need for a lingua franca in such an amazingly diverse country.  The en-IN culture code codifies this, making it possible for an application developer to effectively gloss over this diversity in the Indian market.  Unfortunately, en-IN is not available in the core .NET Framework release.  As you can see from this list, Windows Vista supports this culture, as does Windows 2008.

Obviously, this is problematic for developers doing development on Vista but deploying to Windows 2003 R2.  Fortunately, Microsoft developers faced with the incredible diversity of the world's cultures provided a way to customize the available cultures on any Windows installation: the CultureAndRegionInfoBuilder (CARIB) class.  There is a standard how-to create custom cultures with this class, but we will take a slightly different approach in this entry.  We will export the en-IN culture from our Windows Vista development machine, and import it on our Windows 2003 R2 server.

We'll use PowerShell here, but the examples should be clear enough that you can write your own console or Windows application to execute these steps.  Let's begin with exporting the en-IN culture.  The CARIB class is in sysglobl.dll, so we need a "reference" to that assembly. We can load the assembly from the GAC in PowerShell.  To do so we use its strong name, obtained by using "gacutil /l sysglobl" from a Visual Studio command prompt. In PowerShell on the Vista machine,

PS> [System.Reflection.Assembly]::Load("sysglobl, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a, processorArchitecture=MSIL")

Now that we can reference the class, let's export the culture into an industry standard format called Locale Data Markup Language (LDML) version 1.1 using the Save method.  We can import this later with the CreateFromLdml method.

PS> $enIN = New-Object System.Globalization.CultureAndRegionInfoBuilder("en-IN", "Replacement")
PS> $enIN.LoadDataFromCultureInfo((New-Object System.Globalization.CultureInfo("en-IN")))
PS> $enIN.LoadDataFromRegionInfo((New-Object System.Globalization.RegionInfo("IN")))
PS> $enIN.Save("enIN.ldml")

Now we can copy enIN.ldml to our server and import it.  There are three very important modification we must make to enIN.ldml, however.  The en-IN culture is defined in terms of some other Windows locale information that won't be found on Windows 2003 R2, specifically "text info", "sort", and "fallback".  If you open enIN.ldml, you'll find the following three elements.

<msLocale:textInfoName type="en-IN" />
<msLocale:sortName type="en-IN" />
...
<msLocale:consoleFallbackName type="en-IN" />

If we try to load a CARIB from this file as-is, we'll receive the following error.  For more information, see the section "Exporting Operating System-Specific Cultures" in this CodeProject on-line book.

Culture name 'en-in' is not supported.

We've got to change these to a sensible alternative that is supported on Windows 2003 R2.  I chose "en-US", though "en-GB" would've also been appropriate.  For other cultures, this may prove more difficult.  My changed LDML file contains these lines.

<msLocale:textInfoName type="en-US" />
<msLocale:sortName type="en-US" />
...
<msLocale:consoleFallbackName type="en-US" />

With that done, we can load and register the culture on the Windows 2003 server.

PS> [System.Reflection.Assembly]::Load("sysglobl, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a, processorArchitecture=MSIL")
PS> $enIN = [System.Globalization.CultureAndRegionInfoBuilder]::CreateFromLdml("enIN.ldml")
PS> $enIN.Register()

The culture en-IN is now available to all applications running on your server.  To see this in action, you could now augment the web.config of you web application with a globalization element:

...
<system.web>
  <globalization culture="en-IN" uiCulture="en-IN" />
...


Create a file called i18n.aspx and add the following code:

<%@ Page Language="C#"%>
<% = (10.5).ToString("c") %>

Navigate to i18n.aspx and you should see the following:

Rs. 10.50

I hope this helps!