Unicode and Ruby

I was quite depressed listening to Tim Bray talk about Unicode and Ruby at RubyConf. While Tim did a wonderful job at explaining the problems, he didn’t really provide much in the way of a solution. Which is why I was quite happy when I read Julian Tarkhanov’s slide deck from his Unicode presentation at the Rails Show and Tell meeting in Amerstam where he introduces his Unicode hacks libraries.

I really like his idea about using an accessor proxy on String:


name = 'Claus Müller'
puts name.reverse #=> rell??M sualC
name.length #=> 13
puts name.chars.reverse #=> rellüM sualC
name.chars.length #=> 12

Notice how accessing the same string via the chars accessor correctly reverses the German string.

Twitter Digg Delicious Stumbleupon Technorati Facebook Email

2 Responses to “Unicode and Ruby”

  1. This is actually available now as the ActiveSupport::MultiByte included in edge rails. We’re hoping to provide native support for it in the next release of JRuby. I think it’s on the right track, and it’s certainly got a good chance of getting wide exposure since it will be included in Rails.

  2. You can do this with scan in utf-8 mode:
    $KCODE = ‘u’
    s = “foo\317\200bar”
    puts s.scan(/./).reverse.join # => “rabÏ€oof”