How do I get Nokogiri to add the right XML encoding?

10,608

Solution 1

Are you using Nokogiri XML Builder? You can pass an encoding option to the new() method:

new(options = {})

Create a new Builder object. options are sent to the top level Document that is being built.

Building a document with a particular encoding for example:

  Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
    ...
  end

Also this page says you can do the following (when not using Builder):

doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')

Presumably you could change 'EUC-JP' to 'UTF-8'.

Solution 2

When parsing the doc you can set the encoding like this:

doc = Nokogiri::XML::Document.parse(xml_input, nil, "UTF-8")

For me that returns <?xml version="1.0" encoding="UTF-8"?>

Share:
10,608
Luc
Author by

Luc

Background - Linux administration - Development (Perl, Java/J2E, Ruby, Shell, Javascript, ...) Hot Topics - development of several iOS applications - node.js - NoSQL (redis, mongodb, HBase) - Hadoop Areas of interest - entrepreneurship - finance / stock exchange - foreign languages

Updated on June 14, 2022

Comments

  • Luc
    Luc about 2 years

    I have created a xml doc with Nokogiri: Nokogiri::XML::Document

    The header of my file is <?xml version="1.0"?> but I'd expect to have <?xml version="1.0" encoding="UTF-8"?>. Is there any options I could use so the encoding appears ?

  • Luc
    Luc over 13 years
    in fact, I do not parse an existing file but create a new one using Nokogiri::XML::Document.new
  • LarsH
    LarsH over 2 years
    It's funny that this has been one of my most highly upvoted answers. I have never used Nokogiri or Ruby, just XML and google search.