Monday, May 12, 2008

Hpricot Ruby Script to Digest ISO Currency Codes

UPDATE: I fixed a couple of bugs in this and changed the XML Schema DataType namespace alias to “xs”.  You will likely want to remove some enumeration items because they aren’t particular useful for e-commerce applications, e.g. palladium troy ounce.

require 'hpricot'
require 'open-uri'
doc = Hpricot(open("http://en.wikipedia.org/wiki/ISO_4217"))
codetable = doc.search("//table[@class='wikitable sortable']")[0]
rows = codetable.search("//tr")
for i in 1..rows.length
    tds = rows[i].search("//td")
    unless rows[i] == nil
        puts '<xs:enumeration id="' + tds[3].search("//a[@title]").inner_html.inner_html.gsub(/\s/, '_') + '"  value="' + tds[0].inner_html + '" />'
    end
end

Also, here's a Powershell script to process the ISO 3166 country code list (semi-colon delimited):

gc countrycodes.txt | ? {$_ -match ';'} | % { $s0 = $_.split(';')[0]; $s1 = $_.split(';')[1]; "<xsd:enumeration id=`"$s0`" value=`"$s1`" />" }  | out-file codes.txt