Characters encodings supported by more, cat and less

1,832

Solution 1

Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding, less more and cat are trying to read it as UTF and fail. You can check your current encoding with

echo $LANG

You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type

export LANG="fr_FR.ISO-8859"

For example:

$ echo $LANG 
en_US.UTF-8
$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ export LANG="fr_FR.ISO-8859"
$ xterm <-- open a new terminal 
$ cat foo.txt 
J'ai mal à la tête, c'est chiant!

If you are using gnome-terminal or similar, you may need to activate the encoding, for example for terminator right click and:

enter image description here

For gnome-terminal :

enter image description here

Your other (better) option is to change the file's encoding:

$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ iconv -f ISO-8859-1 -t UTF-8  foo.txt > bar.txt
$ cat bar.txt 
J'ai mal à la tête, c'est chiant!

Solution 2

ISO-8858 character encodings are a bit outdated for Linux systems. Your whole Linux system is likely using UTF-8 all the way. Including your terminal emulator and your shell.

However. cat, grep and less do not do any encoding transformation, they will treat your ISO-8859/latin1 file as UTF-8, which will not work.

If emacs is able to display them, it's because it tries to autodetect the encoding used and apparently succeed. Tell emacs to save the file as UTF-8 and you will be able to use cat/grep/whatever on it.

If you know the exact character encoding (ISO-8859 is a collection of them, you have to know the exact one: ISO-8859-1 or ISO-8859-15 or worse), you can also convert your files from the command line:

iconv --from-code ISO-8859-15 your_file -o your_file_as_utf8

Solution 3

Cat, More and Less are just doing their job of displaying the file. Translating between encodings isn't in their job description. The encoding of newlines isn't a problem as CRLF is displayed just like the normal line ending LF, but your terminal is probably expecting UTF-8-encoded text, which is the de facto standard nowadays.

Luit translates between supported encodings and UTF-8. You tell Luit which encoding to translate by setting the LC_CTYPE environment variable or with the -encoding option. For example, to display a latin-1 (a.k.a. ISO 8859-1) file:

LC_CTYPE=en_US luit less somefile
luit -encoding ISO8859-1 less somefile

If the file is in some exotic encoding that Luit doesn't support, you can pipe it through a translator program. Iconv supports many encodings.

iconv -f latin1 somefile
iconv -f latin1 somefile | less
Share:
1,832

Related videos on Youtube

Didaxis
Author by

Didaxis

Updated on September 18, 2022

Comments

  • Didaxis
    Didaxis almost 2 years

    I'm trying to use an existing, old, ASP.Net web application that will serve as the main UI for our project. The web application employs MANY user controls, with a lot of:

    <element attribute='<%=Page.ResolveUrl("~/path/to/resource")' ...
    

    I have a problem, however. If the application is deployed to IIS, these calls to Page.ResolveUrl() work fine. But if I try to run this application in Visual Studio Development Server, it does not resolve any URLs (i.e., no styles applied, no images rendered, etc.)

    // If deployed in IIS, the style tag renders like this (and works):
    <link href="/adminconsole/styles/styles.css" ...
    
    // On the VS Dev Server:
    <link href="/styles/styles.css" ...
    

    What I want, is for these calls to "just work" no matter if the application is deployed to IIS, or is running on the Visual Studio Development Server.

    Some pertinent info:

    The web app does not use a master page. It includes the header as a user control :(

    • Ben Robinson
      Ben Robinson over 12 years
      When you say they do not work can you explain what they do instead. A good idea would be to compare the output of ResolveUrl on the dev server and on IIS for a given resource.
    • Didaxis
      Didaxis over 12 years
      @Ben -In IIS: styles applied, images rendered. VS Dev Server: no styles applied, no images rendered. I thought it was pretty clear in my post above...
    • Ben Robinson
      Ben Robinson over 12 years
      No that is very unclear, there are many reasons that a style may not be applied, what is the VALUE output by the method, i.e. in your example of <element attribute='<%=Page.ResolveUrl("~/path/to/resource")' what value is the attribute set to in iis versus what is the value set in the dev server.
    • Didaxis
      Didaxis over 12 years
      The stylesheet, for instance, if deployed to IIS renders like this: <link href="/adminconsole/styles/styles.css" ... />, on the VS Dev Server, like this: <link href="/styles/styles.css" ... /> (I've added this info to the OP)
    • mclark1129
      mclark1129 over 12 years
      How is your application deployed in IIS? I'm assuming that it is under the virtual directory adminconsole, but is it configured as its own application? It would seem like to me that "~/path/to/resource" should resolve as "/path/to/resource" regardless of what virtual directory you are in, where "/" is the root of the application, not necessarily the root of the web server.
  • Davide Piras
    Davide Piras over 12 years
    this would probably break if the app is installed in a different way in the future or at present time on a web server where the app is available at: domain/site1/App ... urls should never be hardcoded as in IIS you can configure the app as web site top level or in any unpredictable level of folder nesting...
  • Icarus
    Icarus over 12 years
    I am not suggesting hard coding anything. ResolveUrl accepts relative paths. In fact, if you specify an absolute path ResolveUrl will return it untouched. I fail to see how a relative path, as I suggested, will break under a different virtual directory.
  • Cyrille
    Cyrille almost 4 years
    Depending on your data use ISO-8859-15 or ISO-8859-1. See the differences. Just 8 characters are changed. The Euro symbol is ISO-8859-15.