Use `less` pager on file with non-standard encoding

6,419

Solution 1

Hm, apparently less cannot do this. The part in less' sourcecode that implements the "following" seems to be:

A_F_FOREVER:
                        /*
                         * Forward forever, ignoring EOF.
                         */
                        if (ch_getflags() & CH_HELPFILE)
                                break;
                        cmd_exec();
                        jump_forw();
                        ignore_eoi = 1;
                        while (!sigs)
                        {
                                make_display();
                                forward(1, 0, 0);
                        }
                        ignore_eoi = 0;

As far as my (limited) knowledge of C goes, this means that if "follow" is activated, less will:

  1. seek to the end of input
  2. read and update the display in a loop, until Ctrl-C is pressed

If input is a pipel, 1. will not return until the pipe signals EOF. If I use tail -f xx|less, the pipe will never signal EOF, so less hangs :-(.

I did however find a way to get what I want:

 tail -f inputfile | recode latin1.. > /tmp/tmpfile

then

less +F /tmp/tmpfile

This will work, because it lets less +F work on a real file. It's still somewhat awkward, because recode apparently only processes data in blocks of 4096 bytes, but it works...

Solution 2

It's possible that recode is buffering output in the pipe so output only comes through when the buffer, probably 4K, is full. You can try using the unbuffer script that comes with expect.

Share:
6,419

Related videos on Youtube

sleske
Author by

sleske

Software developer, mathematician SOreadytohelp

Updated on September 17, 2022

Comments

  • sleske
    sleske over 1 year

    I often use the less pager to view logfiles. Usually I use less -F to follow the progress of the log à la tail.

    However, some logfiles use national characters in a non-standard encoding (Latin-1, while the system uses UTF-8). Obviously, these will not be displayed correctly.

    How can I view such files with less?

    The only solutions I found:

    • Correct the encoding of the file (recode or iconv). This does not work while the file is still being written, so does not let me use less -F. Plus it destroys the logfiles original timestamp, which is bad from an auditing perspective.
    • Use a pipe (recode latin1... |less). Works for files in progress, but unfortunately then less -F does not appear to work (it just does not update; I believe the recode process exits once it's done).

    Any solution that lets me "tail" a logfile and still shows national characters correctly?

    • isomorphismes
      isomorphismes almost 6 years
      It looks from man less like there is a preprocessor which you could possibly set to fix your encoding.
    • sleske
      sleske almost 6 years
      @isomorphismes: Yes, less does support calling a preprocessor. However, as far as I can tell, the preprocessor reads the input file and creates a new file for less, so this would not work for less -F.
  • akira
    akira almost 14 years
    that or 'env LC_ALL=en_US.LATIN1 less -F file'
  • sleske
    sleske almost 14 years
    That does not solve my problem. This will cause less to accept Latin-1 characters as regular characters (meaning it does not highlight them), but they will still show up incorrectly in a terminal program that expects UTF-8 (as that's the system default). I want to actually convert the Latin-1 characters to valid UTF-8, not just have them show up as junk/box characters.
  • harrymc
    harrymc almost 14 years
    @sleske: I don't know of a way to convert and do less at the same time on dynamic files. One can define macros per akira's comment for the several possible encodings that you have. This is assuming that your problem is only the display and not pure conversion.
  • sleske
    sleske almost 14 years
    No, that is not the problem. The recode process simply exits after it detects EOF for the file (after all, it has no way of knowing that the file is still growing); I can confirm this using ps. So unbuffer does not help.
  • Dennis Williamson
    Dennis Williamson almost 14 years
    @sleske: Have you tried tail -f | recode ... | less -F?
  • sleske
    sleske almost 14 years
    @Dennis: Actually yes, I tried it, but it didn't help either. It seems less -F just plain does not work on pipes. Even tail -f myfile | less -F does not work, though in this case both processes remain alive.
  • sleske
    sleske almost 14 years
    Anyway, +1 for good hints. Even if they didn't work, it's good to know that :-).
  • Dennis Williamson
    Dennis Williamson almost 14 years
    @sleske: By the way, it's less +F that follows files like tail -f (rather than less -F). After some testing, it looks like recode is doing some buffering that can't be controlled. This works, but the output is in chunks: tail -f inputfile | recode ... | less +F
  • sleske
    sleske almost 14 years
    @Dennis: Interesting. Your example does not work for me: less just hangs with an emtpy screen, until I press Ctrl-C, then it shows its prompt, but no text.
  • sleske
    sleske almost 14 years
    To me it seems rather that less +F waits for EOF in its input before even showing a prompt. Since that never comes, it appears to hang. Just tail -f inputfile | less works, but it still hangs once I invoke Shift-F (or Shift-G). So it seems what I want just isn't possible with less...
  • Dennis Williamson
    Dennis Williamson almost 14 years
    @sleske: Try less in that pipeline without any options: tail -f inputfile | recode ... | less. Note: if your logfile is not getting much traffic, it could take a while before the buffer is full and anything is output.
  • sleske
    sleske almost 14 years
    @Dennis: Yes, I tried that, and it does work, but it's not practical. It will show output, and gives me the less prompt once the first screenful of text has been printed, but scrolling to the end of text still makes less hang until enough fresh text has arrived; and Shift-F or Shift-G still hangs less permanently. So it seems less just can't do what I'd like to do...