tarring in parallel

1,537

Solution 1

Just tar to stdout and pipe it to pigz. (You most likely don't want to parallelize disk access, just the compression part.):

$ tar c- myDirectory/ | pigz > myDirectory.tar.gz

A plain tar invocation like the one above basically only concatenates directory trees in a reversible way. The compression part can be separate as it is in this example.

pigz does multithreaded compression. The number of threads it uses can be adjusted with -p and it'll default to the number of cores available.

Solution 2

With GNU Parallel it looks like this:

parallel tar jcvf /tmp/{= s:/$:: =}.tar.bz2 {} ::: */

or:

parallel tar jcvf /tmp/{}.tar.bz2 {} ::: *

For better compression try:

parallel tar -I pxz -cvf /tmp/{= s:/$:: =}.tar.xz {} ::: */

s:/$:: is a perl expression. It removes the ending /

Solution 3

pbzip2 works quite well. As with the answer above, tar to stdout and pipe to pbzip2:

$ tar -cf - mydir/ | pbzip2 > mydir.tar.bz2

pbzip2 accepts multiple options that allow for adjusting number of processors, amount of memory used, level of compression etc.

http://compression.ca/pbzip2/

Or for one archive per directory (assumes no spaces or special chars in directory names):

for dir in * ; do 
     [[ ! -d ${dir} ]] && continue
     tar cf -  ${dir} | bzip2 > ${dir}.tar.bz2 &
done
Share:
1,537

Related videos on Youtube

VaidAbhishek
Author by

VaidAbhishek

Updated on September 18, 2022

Comments

  • VaidAbhishek
    VaidAbhishek over 1 year

    I have a class A.

    Class A():
        .... 
    

    It has a method B having an position argument c, and keyword argument d.

        def B(self, c, d=None):
            ....
    

    Now, I want to run method B of an object o = A() with args c1, d1. I tried following:

    t = thread.Thread(target=o.B, args=(c1,), kwargs={'d':d1})
    t.start()
    t.join()
    

    but it doesn't work. Hence I also tried.

    t = thread.Thread(target=A.B, args=(o, c1), kwargs={'d':d1})
    t.start()
    t.join()
    

    This also doesn't work. The execution just falls through after t.join(). I set up the breakpoint in Thread Class's first instruction in threading module, but code never reaches there.

    • user2357112
      user2357112 about 10 years
      What do you mean by "doesn't work"? What actually happens? Can you show runnable code that demonstrates the error?
    • Jayanth Koushik
      Jayanth Koushik about 10 years
      Shouldn't 'self' be the first argument of B?
    • Stan Prokop
      Stan Prokop about 10 years
      Beside other issues, you don't have self as first parameter of method B, thus you can't use it as instance method. And yes, "doesn't work" without error messages is wrong.
    • VaidAbhishek
      VaidAbhishek about 10 years
      Sorry yes, self is indeed first argument to method B. I just edited it. Moreover, I gave an account of "doesn't work". Sorry for confusion.
    • sloth
      sloth about 10 years
      kwargs={d:d1} should probably be kwargs={"d":d1}
    • VaidAbhishek
      VaidAbhishek about 10 years
      yes it is 'd' .. again apologies for not writing that.
    • Bibhas Debnath
      Bibhas Debnath about 10 years
      @VaidAbhishek post an answer and accept it yourself.
    • PSkocik
      PSkocik almost 9 years
      Just tar to stdout and pipe it to pigz. (You most likely don't want to parallelize disk access, just the compression part.)
    • ctrl-alt-delor
      ctrl-alt-delor almost 9 years
      @PSkocik pigz is an answer. Could you add a one liner, in an answer.
    • maxschlepzig
      maxschlepzig almost 9 years
      Consider using xz compression, it is usually better than bzip2.