tar uses too much memory for its buffer - workaround?
Your image shows quite the contrary, actually.
As you can see under the RES column, tar
memory consumption is quite low. You RAM usage appear to increase simply because Linux is actively caching the data read by the tar command. This, in turn, causes memory pressure and dirty page writeback (basically, the system flush its write cache to accommodate for the greater read-caching required) and, possibly, useful data are evicted from the I/O cache.
Unfortunately, it seems that tar
itself can not be instructed to use O_DIRECT or POSIX_FADVISE (both of which can be used to "bypass" the cache). So, using tar
there is not a real solution out here...
Related videos on Youtube
starbeamrainbowlabs
Hello! I am a computer science PhD student researcher investigating flooding with AI. I currently know C♯, HTML, CSS, Javascript, PHP, some Python (ewww), C++ (mainly Arduino), and Prolog, SQL, and Rust (mind-bending). I also do Linux sysadmin in my spare time. I enjoy learning new things. If I find or something interesting, I will post about it on my blog - I try to post at least once a week. I have a website, which you should check out if you are interested. I also have a twitter account, but I don't check it all that often - the tweets there are largely automated by IFTTT. If you prefer, I've also created a subreddit that's also automated by IFTTT, though I do check it slightly more often than twitter. I can't think of anything else to put here at the moment, but if there is anything you think I should add / change please contact me somehow and I will consider it (commenting on my blog is a good way to get hold of me).
Updated on September 18, 2022Comments
-
starbeamrainbowlabs almost 2 years
I am
tar
ing and then compressing a bunch of files&directories on my Ubuntu Server VPS for a backup. It only has 1GB of RAM and 128MB of Swap (I can't add more - OVH use OpenVZ as their virtualisation software), and every timetar
runs it uses a ton of memory for it's buffer, causing everything else to get swapped out - even when usingnice -n 10
.Is there any way to force
tar
to use a small buffer and reduce it's memory usage? I am worried that once the backup gets to be a certain size, my server will go down becausetar
won't have enough memory for it's buffer.I am using
bzip2
to compress, and I have already limited it's memory usage with the-4
option.Edit: Here is what
htop
looks like when I have hadtar
running for a while:Here is a link to the full gif
Edit 2: Here is the tar command I am using:
nice -n 20 tar --exclude "*node_modules*" --exclude "*.git/*" --exclude "/srv/www-mail/rainloop/v*" -cf archive.tar /home /var/log /var/mail /srv /etc
-
Marki555 almost 9 yearsHow do you see that
tar
is using much memory? I guess it just causes linux to remove useful "hot" data from its cache and replace it with useless "cold" data which are being backup up (and not needed in the cache) -
starbeamrainbowlabs almost 9 years@Marki555 I used
htop
to observe my memory and swap usage. I used this tutorial to view which proecesses were using the most swap before and after, and I noticed thattar
ing a large amount of stuff causes almost everything else to get swapped out :/ -
Marki555 almost 9 yearsCan you include the output of
htop
into your question? -
starbeamrainbowlabs almost 9 years@Marki555 Sure, I will update the question as soon as I get the chance.
-
starbeamrainbowlabs almost 9 years@Marki555 Done - I've edited the question. I ran the
tar
command in a separate SSH terminal. It's the yellow part of the "Mem" bar that is the problem. I think that stands for the cache? The other problem is now how to clear the buffer.... -
starbeamrainbowlabs almost 9 yearsHold on. Does this have something to do with the fact that I was using
/tmp
to store the archive? -
Fox almost 9 yearsIf your
/tmp
is mounted astmpfs
, then yes, it does. tar itself doesen't seem to use much memory in the screenshot. -
Michael Hampton almost 9 yearsI don't see a
tar
command here. Exactly what are you running? -
starbeamrainbowlabs almost 9 years@MichaelHampton Sorry, I meant to include that in the question. Question updated.
-
Michael Hampton almost 9 yearsAre you putting
archive.tar
in/tmp
then? -
starbeamrainbowlabs almost 9 years@MichaelHampton Yes I was. I have changed it to a different folder now and I still get the same problem.
-
-
starbeamrainbowlabs almost 9 yearsThanks for your explanation. Is there a different tool I can use then that doesn't fill up the read cache?
-
shodanshok almost 9 yearsUnfortunately, only some tools support direct I/O operations. The most common tool is
dd
, and you can use it to compress a file using something asdd if=srcfile bs=1M iflag=direct | bzip2 newfile.bz2
. However, this clearly is a no match for a full directory tree tar -
starbeamrainbowlabs almost 9 yearsThanks for the help. Perhaps I need more ram then...?
-
shodanshok almost 9 yearsYou probably need more RAM and a faster disk subsystem. As a workaround, you can try to totally disable filesystem caching during the tar/bz2 process, then reenable it. To disable caching, remount your filesystem with the
sync
option. For example, using your / filesystem for the tar/bz2 process, you should issuemount / -o remount,sync
. Then, after completion, you can remount it with caching enabled usingmount / -o remount,async
-
starbeamrainbowlabs almost 9 yearsUnfortunately I get
mount: permission denied
if I try the remount sync command. Theasync
one works though. I think this must be because OpenVZ doesn't support it on their VPS classic? Apparently I am running on an SSD. As for my kernel, I am using (and can't change from)Linux 2.6.32-042stab108.5
. -
starbeamrainbowlabs almost 9 yearsUpdate: I have found a tool called nocache which prevents read files from being cached - this seems to solve the problem :D
-
shodanshok almost 9 yearsInteresting utility... I wrote something similar some time ago, just for testing. Anyway, if my reply helped you, please mark it as the accepted answer.
-
starbeamrainbowlabs almost 9 yearsDone - thanks for reminding me! Your answer was helpful in working out what the problem actually was so I could go about finding a solution :)