Use `less` pager on file with non-standard encoding
Solution 1
Hm, apparently less
cannot do this. The part in less' sourcecode that implements the "following" seems to be:
A_F_FOREVER:
/*
* Forward forever, ignoring EOF.
*/
if (ch_getflags() & CH_HELPFILE)
break;
cmd_exec();
jump_forw();
ignore_eoi = 1;
while (!sigs)
{
make_display();
forward(1, 0, 0);
}
ignore_eoi = 0;
As far as my (limited) knowledge of C goes, this means that if "follow" is activated, less will:
- seek to the end of input
- read and update the display in a loop, until Ctrl-C is pressed
If input is a pipel, 1. will not return until the pipe signals EOF. If I use tail -f xx|less
, the pipe will never signal EOF, so less hangs :-(.
I did however find a way to get what I want:
tail -f inputfile | recode latin1.. > /tmp/tmpfile
then
less +F /tmp/tmpfile
This will work, because it lets less +F work on a real file. It's still somewhat awkward, because recode
apparently only processes data in blocks of 4096 bytes, but it works...
Solution 2
It's possible that recode
is buffering output in the pipe so output only comes through when the buffer, probably 4K, is full. You can try using the unbuffer
script that comes with expect
.
Related videos on Youtube
Comments
-
sleske over 1 year
I often use the
less
pager to view logfiles. Usually I useless -F
to follow the progress of the log à latail
.However, some logfiles use national characters in a non-standard encoding (Latin-1, while the system uses UTF-8). Obviously, these will not be displayed correctly.
How can I view such files with
less
?The only solutions I found:
- Correct the encoding of the file (
recode
oriconv
). This does not work while the file is still being written, so does not let me useless -F
. Plus it destroys the logfiles original timestamp, which is bad from an auditing perspective. - Use a pipe (
recode latin1... |less
). Works for files in progress, but unfortunately thenless -F
does not appear to work (it just does not update; I believe therecode
process exits once it's done).
Any solution that lets me "tail" a logfile and still shows national characters correctly?
-
isomorphismes almost 6 yearsIt looks from
man less
like there is a preprocessor which you could possibly set to fix your encoding. -
sleske almost 6 years@isomorphismes: Yes,
less
does support calling a preprocessor. However, as far as I can tell, the preprocessor reads the input file and creates a new file forless
, so this would not work forless -F
.
- Correct the encoding of the file (
-
akira almost 14 yearsthat or 'env LC_ALL=en_US.LATIN1 less -F file'
-
sleske almost 14 yearsThat does not solve my problem. This will cause
less
to accept Latin-1 characters as regular characters (meaning it does not highlight them), but they will still show up incorrectly in a terminal program that expects UTF-8 (as that's the system default). I want to actually convert the Latin-1 characters to valid UTF-8, not just have them show up as junk/box characters. -
harrymc almost 14 years@sleske: I don't know of a way to convert and do less at the same time on dynamic files. One can define macros per akira's comment for the several possible encodings that you have. This is assuming that your problem is only the display and not pure conversion.
-
sleske almost 14 yearsNo, that is not the problem. The
recode
process simply exits after it detects EOF for the file (after all, it has no way of knowing that the file is still growing); I can confirm this usingps
. So unbuffer does not help. -
Dennis Williamson almost 14 years@sleske: Have you tried
tail -f | recode ... | less -F
? -
sleske almost 14 years@Dennis: Actually yes, I tried it, but it didn't help either. It seems
less -F
just plain does not work on pipes. Eventail -f myfile | less -F
does not work, though in this case both processes remain alive. -
sleske almost 14 yearsAnyway, +1 for good hints. Even if they didn't work, it's good to know that :-).
-
Dennis Williamson almost 14 years@sleske: By the way, it's
less +F
that follows files liketail -f
(rather thanless -F
). After some testing, it looks likerecode
is doing some buffering that can't be controlled. This works, but the output is in chunks:tail -f inputfile | recode ... | less +F
-
sleske almost 14 years@Dennis: Interesting. Your example does not work for me: less just hangs with an emtpy screen, until I press Ctrl-C, then it shows its prompt, but no text.
-
sleske almost 14 yearsTo me it seems rather that
less +F
waits for EOF in its input before even showing a prompt. Since that never comes, it appears to hang. Justtail -f inputfile | less
works, but it still hangs once I invoke Shift-F (or Shift-G). So it seems what I want just isn't possible with less... -
Dennis Williamson almost 14 years@sleske: Try
less
in that pipeline without any options:tail -f inputfile | recode ... | less
. Note: if your logfile is not getting much traffic, it could take a while before the buffer is full and anything is output. -
sleske almost 14 years@Dennis: Yes, I tried that, and it does work, but it's not practical. It will show output, and gives me the less prompt once the first screenful of text has been printed, but scrolling to the end of text still makes less hang until enough fresh text has arrived; and Shift-F or Shift-G still hangs less permanently. So it seems less just can't do what I'd like to do...