How can I output UTF-8 from Perl?
Solution 1
use utf8;
does not enable Unicode output - it enables you to type Unicode in your program. Add this to the program, before your print()
statement:
binmode(STDOUT, ":utf8");
See if that helps. That should make STDOUT
output in UTF-8 instead of ordinary ASCII.
Solution 2
You can use the open pragma.
For eg. below sets STDOUT, STDIN & STDERR to use UTF-8....
use open qw/:std :utf8/;
Solution 3
TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don't have to think about it.
In the environment:
export PERL_UNICODE=SDL
on the command line:
perl -CSDL -le 'print "\x{1815}"';
or with binmode:
binmode(STDOUT, ":utf8"); #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8
or with PerlIO:
open my $fh, ">:utf8", $filename
or die "could not open $filename: $!\n";
open my $fh, "<:encoding(utf-8)", $filename
or die "could not open $filename: $!\n";
or with the open pragma:
use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";
Solution 4
You also want to say, that strings in your code are utf-8. See Why does modern Perl avoid UTF-8 by default?. So set not only PERL_UNICODE=SDAL
but also PERL5OPT=-Mutf8
.
Admin
Updated on December 04, 2020Comments
-
Admin over 3 years
I am trying to write a Perl script using the
utf8
pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.However, when I enter the following into a text file, save it as a ".pl", and execute it, I get the friendly "diamond with a question mark" in place of the non-ASCII characters.
#!/usr/bin/env perl -w use strict; use utf8; my $str = 'Çirçös'; print( "$str\n" );
Any idea what I'm doing wrong? I expect to get 'Çirçös' in the output, but I get '�ir��s' instead.
-
Paul Tomblin about 15 yearsI didn't know about this (I've only been putting UTF8 in a database, never printing it). +1.
-
Admin about 15 yearsActually, it was set to utf-8. The problem was that I was outputting to STDOUT without setting binmode to utf-8;
-
visual_learner about 15 yearsYou're welcome. See also another correct answer: stackoverflow.com/questions/627661/writing-perl-code-in-utf8/… and remember, TMTOWTDI. And @Paul - if you're writing UTF-8 to a file, you should probably use binmode() on that filehandle and make it "proper" UTF-8, but if it works..
-
sger about 15 yearsBTW... I gave u +1. I think binmode(STDOUT, ':utf8') is probably more correct in this situation. "use open" has other good uses but I can't seem to find how u can set it to just encode STDOUT only?
-
jrockway about 15 yearsThis would be an orthogonal concern. You need your Perl script to output correct data before you can worry about how your terminal emulator interprets it.
-
ysth about 15 yearsother ways: the open pragma ( search.cpan.org/perldoc/open ), the -C switch ( perldoc.perl.org/perlrun.html#-C )
-
visual_learner about 15 yearsI shy away from the -C switch because not all Perls (i.e. ActivePerl) can process command-line switches well (to my knowledge),
-
Shailesh about 15 yearsFWIW here is the reason: strings that contains only latin1 (ISO-8859-1) characters, despite being stored more or less in utf8, will be output as latin1 by default. This way scripts from a pre-unicode era still work the same, even with a unicode-aware perl.
-
Chas. Owens about 15 yearsThe utf8 pragma does not let you write your source in UNICODE, it forces understand of your source in the UTF-8 (or UTF-EBCDIC) encoding of UNICODE, an important distinction.
-
Peter V. Mørch almost 11 yearsSlightly off-topic: In my opinion UTF-8 in perl is a mess! Sometimes it helps to know that one is not alone. See my post: Why am I having so much trouble and pain with UTF-8 in perl?
-
galactica about 10 yearsAs per Perl 5.8.8, I can't get this working when trying to write Unicode strings into a file (I got warning like "Wide character in print ..."), although others say they did. The open pragma one works.
-
mklement0 almost 10 years+1 for a comprehensive answer; note that
SDL
is implied both with-C
andPERL_UNICODE
. Theuse open ':locale'
pragma is also worth mentioning, because it is the in-script equivalent of-C
andexport PER_UNICODE=
. Any of these 3 will give you UTF8 support for all input and output streams (whether files or stdin/stdout/stderr), assuming your environment's locale is UTF8-based. Finally, to also treat source code as UTF8, use theuse utf8;
pragma. -
vladr almost 6 years
perl -Mutf8 -CSDL -e '...'
allows to consume/output UTF-8 as well as use UTF-8 literals inside-e
e.g. for a poor man's case folder:perl -Mutf8 -CASDL -pe 'y/āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛ/aaaaeeeeiiiioooouuuuüüüüAAAAEEEEIIIIOOOOUUUUÜÜÜÜ/'