Unicode and Ruby
I was quite depressed listening to Tim Bray talk about Unicode and Ruby at RubyConf. While Tim did a wonderful job at explaining the problems, he didn’t really provide much in the way of a solution. Which is why I was quite happy when I read Julian Tarkhanov’s slide deck from his Unicode presentation at the Rails Show and Tell meeting in Amerstam where he introduces his Unicode hacks libraries.
I really like his idea about using an accessor proxy on String:
name = 'Claus Müller'
puts name.reverse #=> rell??M sualC
name.length #=> 13
puts name.chars.reverse #=> rellüM sualC
name.chars.length #=> 12
Notice how accessing the same string via the chars accessor correctly reverses the German string.


03. Nov, 2006 







This is actually available now as the ActiveSupport::MultiByte included in edge rails. We’re hoping to provide native support for it in the next release of JRuby. I think it’s on the right track, and it’s certainly got a good chance of getting wide exposure since it will be included in Rails.
You can do this with scan in utf-8 mode:
$KCODE = ‘u’
s = “foo\317\200bar”
puts s.scan(/./).reverse.join # => “rabÏ€oof”