Logrotate Successful, original file goes back to original size

linux redhat log-files logrotate

19,284

Solution 1

This is probably because even though you truncate the file, the process writing to the file will continue writing at whichever offset it were at last. So what's happening is that logrotate truncates the file, size is zero, process writes to the file again, continuing at the offset it left off, and you now have a file with NULL-bytes up to the point where you truncated it plus the new entries written to the log.

od -c after truncate + sudden growth, generated output along the lines of:

0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
33255657600  \0   C   K   B   -   s   e   r   v   e   r       [   h   t   t
33255657620 <more log output>

What this says is from offset 0 to 33255657600 your file consists of null bytes, and then some legible data. Getting to this state doesn't take the same amount of time it would take to actually write all those null-bytes. The ext{2,3,4} filesystems support something called sparse files, so if you seek past a region of a file that doesn't contain anything, that region will be assumed to contain null-bytes and won't take up space on disk. Those null bytes won't actually be written, just assumed to be there, hence the time it takes to go to 0 to 3.5GB don't take a lot of time to. (You can test the amount of time it takes by doing something like dd if=${HOME}/.bashrc of=largefile.bin seek=3432343264 bs=1, this should generate a file of over 3GB in a few milliseconds).

If you run ls -ls on your logfiles after they've been truncated and had a sudden growth again, it should now report a number at the beginning of the line which represents the actual size (in blocks occupied on disk), which probably is orders of magnitude smaller than the size reported by just ls -l.

Solution 2

I am extremely confident that Kjetil has hit it. Drew, you may not yet be convinced by his explanation but I urge you to read carefully what he's said.

If you accept it, the fix is either to stop and restart your application when the logs are rotated, or use a tool like apache's "rotatelogs", where you feed the log output to the tool via a pipe, and the tool takes care of rotating the logfile every so often. For example, one of my apache instances logs with

ErrorLog "|/usr/sbin/rotatelogs /www/logs/error_log 604800"

which causes a lot of logfiles with names like

-rw-r--r--    1 root     root         4078 Dec 21 01:04 error_log.1292457600
-rw-r--r--    1 root     root         4472 Dec 29 08:41 error_log.1293062400
-rw-r--r--    1 root     root        78630 Jan  4 12:57 error_log.1293667200
-rw-r--r--    1 root     root        15753 Jan 12 01:10 error_log.1294272000

to appear without restarting apache; I can then compress them manually after the fact. Note how the rotation is done every week, which is every 604800 seconds, that being the argument passed to rotatelogs.

If you can't stop and restart the app, and it can't log via a pipe, then I think you've got a real problem. Perhaps others will have suggestions.

19,284

drewrockshard

Updated on September 17, 2022

Comments

drewrockshard over 1 year

Has anyone had any issues with logrotate before that causes a log file to get rotated and then go back to the same size it originally was? Here's my findings:

Logrotate Script:

/var/log/mylogfile.log {
    rotate 7
    daily
    compress
    olddir /log_archives
    missingok
    notifempty
    copytruncate
}

Verbose Output of Logrotate:

copying /var/log/mylogfile.log to /log_archives/mylogfile.log.1
truncating /var/log/mylogfile.log
compressing log with: /bin/gzip
removing old log /log_archives/mylogfile.log.8.gz

Log file after truncate happens

[root@server ~]# ls -lh /var/log/mylogfile.log
-rw-rw-r--  1 part1 part1 0 Jan 11 17:32 /var/log/mylogfile.log

Literally Seconds Later:

[root@server ~]# ls -lh /var/log/mylogfile.log
-rw-rw-r--  1 part1 part1 3.5G Jan 11 17:32 /var/log/mylogfile.log

RHEL Version:

[root@server ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 4 (Nahant Update 4)

Logrotate Version:

[root@DAA21529WWW370 ~]# rpm -qa | grep logrotate
logrotate-3.7.1-10.RHEL4

Few Notes:

Service can't be restarted on the fly, so that's why I'm using copytruncate
Logs are rotating every night, according to the olddir directory having log files in it from each night.

Kjetil Joergensen over 13 years

There should be an easy way of verifying this, inspect the file and see if there's actually anything in it. Start out by running od -c filename | head -n 100. Chances are it'll tell you in a few lines that there's a truckload of null-bytes, plus the latest log-entries.
Zubair over 13 years

If this is the case then can we test whether cat /dev/null > /var/log/mylogfile.log truncates the file in a way that solves this issue, rather than using logrotate built in copytruncate?
Kjetil Joergensen over 13 years

The problem doesn't lie with the way the file is truncated, the problem lies with how the program writes the logfile. For truncating the logfile to be a viable approach, you'll have to get the program writing to the logfile somehow seek to position 0. It's possible that a filesystem that doesn't support sparse files wouldn't have this proble, although I don't have any way of testing that right now.
drewrockshard over 13 years

This was my findings, still not sure why this is having issues. sprunge.us/LAXY - The first column goes from 33255530660 -> 0000000 -> 17362070000 -> 0000000 -> 33255657600. The command I ended up running was while true; do od -c /home/part1/kbServer/ckbServer/CKBServer.log | head -n 100;done; so that I could continue reading in the file.
Kjetil Joergensen over 13 years

The left column is offset. What would be nice would be to have some delimiter of each run. The left column is offset so, 33255530660 looks to be "dangling", that is, the beginning of that od -c | tail .. run is gone. However, if you see the 000000 -> 33255657600 run, what this tells you is. The first 16 bytes consists of NULL. Then, there's a asterisk in the left column saying that the entries from the last offset are repeated, Then you get to 33255657600 which actually contains any legible data. So, yes, the case is that what writes to the file continues at the offset it were at.
drewrockshard over 13 years

Correct, but still does not explain why I got from 0 bytes to 3.5GB in less than 5 minutes. Additionally, it does not explain why it goes back to the same original size. It's ALWAYS the same original size.
drewrockshard over 13 years

Can't use kill -HUP as this application cannot be touched in any way - it's a sensitive application that I do not own (I don't even manage it - I only manage the OS side) so I need to be able to do the logrotations this way.
Kjetil Joergensen over 13 years

0 -> 3.5GB in less than 5 minutes incorporated into answer, as it's a bit too large for a comment. When you say "the same original size" is this as reported by: ls -l or ls -lh, the latter does an awful lot of rounding which would hide a few lines appended to a file within rounding errors.
drewrockshard over 13 years

This final answer actually made more sense and the fix will probably end up being to restart the application.
kasperd about 8 years

There is no indication that writing to the log file is done using > from a script. Moreover this answer is confusing because there is a major difference between > and >( ) in scripts. Finally if the code doing the writing is updated, it would be much better to have it simply start writing to the new logfile after logrotate has done its thing.
user578558 about 8 years

I edit my answer so it's clearer.
turkenh over 7 years

Please also note that, opening file in append mode is the solution here in case you have such a control on the application, as detailed in serverfault.com/a/522687/387313. Because O_APPEND triggers seek to the end of file before each write, see man7.org/linux/man-pages/man2/open.2.html