How can I output UTF-8 from Perl?

129,218

Solution 1

use utf8; does not enable Unicode output - it enables you to type Unicode in your program. Add this to the program, before your print() statement:

binmode(STDOUT, ":utf8");

See if that helps. That should make STDOUT output in UTF-8 instead of ordinary ASCII.

Solution 2

You can use the open pragma.

For eg. below sets STDOUT, STDIN & STDERR to use UTF-8....

use open qw/:std :utf8/;

Solution 3

TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don't have to think about it.

In the environment:

export PERL_UNICODE=SDL

on the command line:

perl -CSDL -le 'print "\x{1815}"';

or with binmode:

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

or with PerlIO:

open my $fh, ">:utf8", $filename
    or die "could not open $filename: $!\n";

open my $fh, "<:encoding(utf-8)", $filename
    or die "could not open $filename: $!\n";

or with the open pragma:

use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";

Solution 4

You also want to say, that strings in your code are utf-8. See Why does modern Perl avoid UTF-8 by default?. So set not only PERL_UNICODE=SDAL but also PERL5OPT=-Mutf8.

Share:
129,218
Admin
Author by

Admin

Updated on December 04, 2020

Comments

  • Admin
    Admin over 3 years

    I am trying to write a Perl script using the utf8 pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.

    However, when I enter the following into a text file, save it as a ".pl", and execute it, I get the friendly "diamond with a question mark" in place of the non-ASCII characters.

    #!/usr/bin/env perl -w
    
    use strict;
    use utf8;
    
    my $str = 'Çirçös';
    print( "$str\n" );
    

    Any idea what I'm doing wrong? I expect to get 'Çirçös' in the output, but I get '�ir��s' instead.

  • Paul Tomblin
    Paul Tomblin about 15 years
    I didn't know about this (I've only been putting UTF8 in a database, never printing it). +1.
  • Admin
    Admin about 15 years
    Actually, it was set to utf-8. The problem was that I was outputting to STDOUT without setting binmode to utf-8;
  • visual_learner
    visual_learner about 15 years
    You're welcome. See also another correct answer: stackoverflow.com/questions/627661/writing-perl-code-in-utf8‌​/… and remember, TMTOWTDI. And @Paul - if you're writing UTF-8 to a file, you should probably use binmode() on that filehandle and make it "proper" UTF-8, but if it works..
  • sger
    sger about 15 years
    BTW... I gave u +1. I think binmode(STDOUT, ':utf8') is probably more correct in this situation. "use open" has other good uses but I can't seem to find how u can set it to just encode STDOUT only?
  • jrockway
    jrockway about 15 years
    This would be an orthogonal concern. You need your Perl script to output correct data before you can worry about how your terminal emulator interprets it.
  • ysth
    ysth about 15 years
    other ways: the open pragma ( search.cpan.org/perldoc/open ), the -C switch ( perldoc.perl.org/perlrun.html#-C )
  • visual_learner
    visual_learner about 15 years
    I shy away from the -C switch because not all Perls (i.e. ActivePerl) can process command-line switches well (to my knowledge),
  • Shailesh
    Shailesh about 15 years
    FWIW here is the reason: strings that contains only latin1 (ISO-8859-1) characters, despite being stored more or less in utf8, will be output as latin1 by default. This way scripts from a pre-unicode era still work the same, even with a unicode-aware perl.
  • Chas. Owens
    Chas. Owens about 15 years
    The utf8 pragma does not let you write your source in UNICODE, it forces understand of your source in the UTF-8 (or UTF-EBCDIC) encoding of UNICODE, an important distinction.
  • Peter V. Mørch
    Peter V. Mørch almost 11 years
    Slightly off-topic: In my opinion UTF-8 in perl is a mess! Sometimes it helps to know that one is not alone. See my post: Why am I having so much trouble and pain with UTF-8 in perl?
  • galactica
    galactica about 10 years
    As per Perl 5.8.8, I can't get this working when trying to write Unicode strings into a file (I got warning like "Wide character in print ..."), although others say they did. The open pragma one works.
  • mklement0
    mklement0 almost 10 years
    +1 for a comprehensive answer; note that SDL is implied both with -C and PERL_UNICODE. The use open ':locale' pragma is also worth mentioning, because it is the in-script equivalent of -C and export PER_UNICODE=. Any of these 3 will give you UTF8 support for all input and output streams (whether files or stdin/stdout/stderr), assuming your environment's locale is UTF8-based. Finally, to also treat source code as UTF8, use the use utf8; pragma.
  • vladr
    vladr almost 6 years
    perl -Mutf8 -CSDL -e '...' allows to consume/output UTF-8 as well as use UTF-8 literals inside -e e.g. for a poor man's case folder: perl -Mutf8 -CASDL -pe 'y/āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛ/aaaaeeee‌​iiiioooouuuuüüüüAAAA‌​EEEEIIIIOOOOUUUUÜÜÜÜ‌​/'