This blog is no longer updated. We have moved to trix.pl/blog. Please update your bookmarks.

Ruby on Rails and Unicode (Nothing is perfect)

Unfortunately Ruby on Rails (and Ruby in general) does not support Unicode out of the box. This sucks. It is possible to use Unicode in Rails projects as shown on Rails Wiki. After 2 hours of testing I've found that almost everything works good. I even asked Julik (author of unicode_hacks plugin) if it's true what the wiki says about bugs triggered by these hacks. He told me to be careful around ActionMailer. I instantly did some testing and found it works ok (besides I had to explictly set base64 encoding for email body, but it's ActionMailer problem).

One thing that annoyed me was Google WSDL driver (by soap4r) which no longer works. This issue is connected to $KCODE variable which is set to 'UTF8' to make all this Unicode stuff work. When it's set GoogleSearch doesn't work. I tried to set proper soap envelope options but it didn't work so I've come with this ugly hack:

begin
# utf-8 workaround
kcode = $KCODE # no longer needed
$KCODE = "" # Updated: no longer needed
driver = SOAP::WSDLDriverFactory.new("http://api.google.com/GoogleSearch.wsdl").create_rpc_driver

# UPDATED: following line is no longer needed
#driver.options["soap.envelope.use_numeric_character_reference"] = true
#driver.wiredump_dev = STDOUT
rescue
puts "Error creating rpc driver: " + $!
return
ensure
$KCODE = kcode # Updated: no longer needed
end

I don't like it. If you know the better way, let me know.


UPDATE:

I turned off unicode_hacks and now everything works. Soap4r was broken because of overriden versions of String methods. So now I have SOAP, ActionMailer and Rails in general working. Only things (but essential!) that left are broken (unicode-unaware) String methods (I live with jcode).

The reason why all was broken before, was unicode versions of String methods provided by Julik's unicode_hacks (among others tweaks). They're used deeply in the code of Soap4r (which is anyway aware of existence of Unicode) despite these methods don't know anything about multibyte characters.

Don't get me wrong: unicode_hacks are the only way if you want to do some serious work with multibyte character strings. You can also take a glance at Unicode Library for Ruby by Yoshida Masato.

I have to do more tests but my recommendation for today is:
  • read HowToUseUnicodeStrings
  • set $KCODE = 'UTF8' (shorthand version: 'u')
  • put encoding: utf8 to your database.yml
  • set your db of choice to use utf-8 to store data
  • use jcode / Unicode library if you need
Note: I assume you're livin' on the EdgeRails or at least Rails 1.1.0.

2 Comments:

At 1:56 AM CEST, Anonymous Anonimowy said...

Thanks for the tips - much appreciated.

 
At 7:00 PM CEST, Anonymous Anonimowy said...

I changed unicode_hacks now so that they are used explicitly, without overrides. You might want to try these.

 

Prześlij komentarz

<< Home