Tip: recover to_xml serialized latin chars

I am generating an XML file for dumping some data from one application to another. Yes, I know it’s nicer to build a RESTful API, but it’s for a very custom scenario and the shorter way is to write an XML based interface. So I simply export some contents:

def show_some_content
  @content = Content.find(:all)

  respond_to do |format|

For my view:

<%= @content.to_xml %>

The problem then is the resulting XML has not only escaped the HTML entities, but also latin characters (ie accents and tildes). After googling for an hour I’ve found some people blaming to_xs method (HTML escaped version of to_s, as its definition says) used for XML serialization.

Well, from the other app I need to parse this XML and get the original text. How the hell do I get latin chars unescaped again?

require 'cgi'
require 'iconv'

encoded_text = "ram&#243;n"
# some fucking encoded text coming from the XML

puts "#{Iconv.iconv('UTF-8', 'ISO-8859-1', CGI.unescapeHTML(encoded_text)).to_s}"
# => "ramón"

That’s it :-)

Leave a Reply

Your email address will not be published.