tar uses too much memory for its buffer - workaround?

6,660

Your image shows quite the contrary, actually.

As you can see under the RES column, tar memory consumption is quite low. You RAM usage appear to increase simply because Linux is actively caching the data read by the tar command. This, in turn, causes memory pressure and dirty page writeback (basically, the system flush its write cache to accommodate for the greater read-caching required) and, possibly, useful data are evicted from the I/O cache.

Unfortunately, it seems that tar itself can not be instructed to use O_DIRECT or POSIX_FADVISE (both of which can be used to "bypass" the cache). So, using tar there is not a real solution out here...

Share:
6,660

Related videos on Youtube

starbeamrainbowlabs
Author by

starbeamrainbowlabs

Hello! I am a computer science PhD student researcher investigating flooding with AI. I currently know C♯, HTML, CSS, Javascript, PHP, some Python (ewww), C++ (mainly Arduino), and Prolog, SQL, and Rust (mind-bending). I also do Linux sysadmin in my spare time. I enjoy learning new things. If I find or something interesting, I will post about it on my blog - I try to post at least once a week. I have a website, which you should check out if you are interested. I also have a twitter account, but I don't check it all that often - the tweets there are largely automated by IFTTT. If you prefer, I've also created a subreddit that's also automated by IFTTT, though I do check it slightly more often than twitter. I can't think of anything else to put here at the moment, but if there is anything you think I should add / change please contact me somehow and I will consider it (commenting on my blog is a good way to get hold of me).

Updated on September 18, 2022

Comments

  • starbeamrainbowlabs
    starbeamrainbowlabs almost 2 years

    I am taring and then compressing a bunch of files&directories on my Ubuntu Server VPS for a backup. It only has 1GB of RAM and 128MB of Swap (I can't add more - OVH use OpenVZ as their virtualisation software), and every time tar runs it uses a ton of memory for it's buffer, causing everything else to get swapped out - even when using nice -n 10.

    Is there any way to force tar to use a small buffer and reduce it's memory usage? I am worried that once the backup gets to be a certain size, my server will go down because tar won't have enough memory for it's buffer.

    I am using bzip2 to compress, and I have already limited it's memory usage with the -4 option.

    Edit: Here is what htop looks like when I have had tar running for a while:

    enter image description here

    Here is a link to the full gif

    Edit 2: Here is the tar command I am using:

    nice -n 20 tar --exclude "*node_modules*" --exclude "*.git/*" --exclude "/srv/www-mail/rainloop/v*"  -cf archive.tar /home /var/log /var/mail /srv /etc
    
    • Marki555
      Marki555 almost 9 years
      How do you see that tar is using much memory? I guess it just causes linux to remove useful "hot" data from its cache and replace it with useless "cold" data which are being backup up (and not needed in the cache)
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      @Marki555 I used htop to observe my memory and swap usage. I used this tutorial to view which proecesses were using the most swap before and after, and I noticed that taring a large amount of stuff causes almost everything else to get swapped out :/
    • Marki555
      Marki555 almost 9 years
      Can you include the output of htop into your question?
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      @Marki555 Sure, I will update the question as soon as I get the chance.
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      @Marki555 Done - I've edited the question. I ran the tar command in a separate SSH terminal. It's the yellow part of the "Mem" bar that is the problem. I think that stands for the cache? The other problem is now how to clear the buffer....
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      Hold on. Does this have something to do with the fact that I was using /tmp to store the archive?
    • Fox
      Fox almost 9 years
      If your /tmp is mounted as tmpfs, then yes, it does. tar itself doesen't seem to use much memory in the screenshot.
    • Michael Hampton
      Michael Hampton almost 9 years
      I don't see a tar command here. Exactly what are you running?
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      @MichaelHampton Sorry, I meant to include that in the question. Question updated.
    • Michael Hampton
      Michael Hampton almost 9 years
      Are you putting archive.tar in /tmp then?
    • starbeamrainbowlabs
      starbeamrainbowlabs almost 9 years
      @MichaelHampton Yes I was. I have changed it to a different folder now and I still get the same problem.
  • starbeamrainbowlabs
    starbeamrainbowlabs almost 9 years
    Thanks for your explanation. Is there a different tool I can use then that doesn't fill up the read cache?
  • shodanshok
    shodanshok almost 9 years
    Unfortunately, only some tools support direct I/O operations. The most common tool is dd, and you can use it to compress a file using something as dd if=srcfile bs=1M iflag=direct | bzip2 newfile.bz2. However, this clearly is a no match for a full directory tree tar
  • starbeamrainbowlabs
    starbeamrainbowlabs almost 9 years
    Thanks for the help. Perhaps I need more ram then...?
  • shodanshok
    shodanshok almost 9 years
    You probably need more RAM and a faster disk subsystem. As a workaround, you can try to totally disable filesystem caching during the tar/bz2 process, then reenable it. To disable caching, remount your filesystem with the sync option. For example, using your / filesystem for the tar/bz2 process, you should issue mount / -o remount,sync. Then, after completion, you can remount it with caching enabled using mount / -o remount,async
  • starbeamrainbowlabs
    starbeamrainbowlabs almost 9 years
    Unfortunately I get mount: permission denied if I try the remount sync command. The async one works though. I think this must be because OpenVZ doesn't support it on their VPS classic? Apparently I am running on an SSD. As for my kernel, I am using (and can't change from) Linux 2.6.32-042stab108.5.
  • starbeamrainbowlabs
    starbeamrainbowlabs almost 9 years
    Update: I have found a tool called nocache which prevents read files from being cached - this seems to solve the problem :D
  • shodanshok
    shodanshok almost 9 years
    Interesting utility... I wrote something similar some time ago, just for testing. Anyway, if my reply helped you, please mark it as the accepted answer.
  • starbeamrainbowlabs
    starbeamrainbowlabs almost 9 years
    Done - thanks for reminding me! Your answer was helpful in working out what the problem actually was so I could go about finding a solution :)