What is the fastest way to create a list of directories specified in a file?

10,010

Solution 1

With GNU xargs:

xargs -d '\n' mkdir -p -- < foo.txt

xargs will run as few mkdir commands as possible.

With standard syntax:

(export LC_ALL=C
 sed 's/[[:blank:]"\'\'']/\\&/g' < foo.txt | xargs mkdir -p --)

Where it's not efficient is that mkdir -p a/b/c will attempt some mkdir("a") and possibly stat("a") and chdir("a") and same for "a/b" even if "a/b" existed beforehand.

If your foo.txt has:

a
a/b
a/b/c

in that order, that is, if for each path, there have been a line for each of the path components before, then you can omit the -p and it will be significantly more efficient. Or alternatively:

perl -lne 'mkdir $_ or warn "$_: $!\n"' < foo.txt

Which avoids invoking a (many) mkdir command altogether.

Solution 2

I know we will get lot of answers for this question.But still you can TRY this :) :D

while read -r line; do mkdir -p "$line" ; done < file.txt

Share:
10,010

Related videos on Youtube

Kaizer Sozay
Author by

Kaizer Sozay

Updated on September 18, 2022

Comments

  • Kaizer Sozay
    Kaizer Sozay almost 2 years

    I have a text file, "foo.txt", that specifies a directory in each line:

    data/bar/foo
    data/bar/foo/chum
    data/bar/chum/foo
    ...
    

    There could be millions of directories and subdirectories What is the quickest way to create all the directories in bulk, using a terminal command ?

    By quickest, I mean quickest to create all the directories. Since there are millions of directories there are many write operations.

    I am using ubuntu 12.04.

    EDIT: Keep in mind, the list may not fit in memory, since there are MILLIONS of lines, each representing a directory.

    EDIT: My file has 4.5 million lines, each representing a directory, composed of alphanumeric characters, the path separator "/" , and possibly "../"

    When I ran xargs -d '\n' mkdir -p < foo.txt after a while it kept printing errors until i did ctrl + c:

    mkdir: cannot create directory `../myData/data/a/m/e/d': No space left on device

    But running df -h gives the following output:

    Filesystem      Size  Used Avail Use% Mounted on
    /dev/xvda        48G   20G   28G  42% /
    devtmpfs        2.0G  4.0K  2.0G   1% /dev
    none            401M  164K  401M   1% /run
    none            5.0M     0  5.0M   0% /run/lock
    none            2.0G     0  2.0G   0% /run/shm
    

    free -m

     total       used       free     shared    buffers     cached
    Mem:          4002       3743        258          0       2870         13
    -/+ buffers/cache:        859       3143
    Swap:          255         26        229
    

    EDIT: df -i

    Filesystem      Inodes   IUsed  IFree IUse% Mounted on
    /dev/xvda      2872640 1878464 994176   66% /
    devtmpfs        512053    1388 510665    1% /dev
    none            512347     775 511572    1% /run
    none            512347       1 512346    1% /run/lock
    none            512347       1 512346    1% /run/shm
    

    df -T

    Filesystem     Type     1K-blocks     Used Available Use% Mounted on
    /dev/xvda      ext4      49315312 11447636  37350680  24% /
    devtmpfs       devtmpfs   2048212        4   2048208   1% /dev
    none           tmpfs       409880      164    409716   1% /run
    none           tmpfs         5120        0      5120   0% /run/lock
    none           tmpfs      2049388        0   2049388   0% /run/shm
    

    EDIT: I increased the number of inodes, and reduced the depth of my directories, and it seemed to work. It took 2m16seconds this time round.

    • Sreeraj
      Sreeraj over 9 years
      Is this a virtual machine? Does the main node has enough space?
    • Kaizer Sozay
      Kaizer Sozay over 9 years
      @Sree It is a Linode VPS. How can I tell if it has enough space ? The directory I am running it in is in /home/myuser/ which should have a lot of free space
    • Sreeraj
      Sreeraj over 9 years
      Yes. You seem to have enough space in all the partitions, there are free inodes, but still if it says you don't have enough space, probably the hypervisor on which your VPS is located has run out of space. You might have to contact your VPS provider to check that.
    • PM 2Ring
      PM 2Ring over 9 years
      Is that output from df -i from before or after you try to run xargs -d '\n' mkdir -p < foo.txt ?
    • Stéphane Chazelas
      Stéphane Chazelas over 9 years
      What FS type (df -T /)?
    • Kaizer Sozay
      Kaizer Sozay over 9 years
      @StéphaneChazelas updated question.
    • Kaizer Sozay
      Kaizer Sozay over 9 years
      @StéphaneChazelas I ignored the problem, and just increased the size of the disk image so that there are more inodes. I also reduced the depth of the directory structure and it seems to work. So now I could run your command without problem :)
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    That's running one mkdir per directory and is flawed because of that wrong usage of the split+glob operator. That also means storing that whole huge list in memory.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    That's running one mkdir per directory and is flawed because of that wrong usage of read and the split+glob operator.
  • Sreeraj
    Sreeraj over 9 years
    Cool. Going through the man page of xargs now after looking at your comment in the question. Always something new to learn everytime I open SE :)
  • Thushi
    Thushi over 9 years
    Yes it is.Because of dependency.To create the folder bar we should have data and in the same way for others.But I didn't find any flaws in read.Can you execute my command and check it once?.I did and it's working for me.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    To read a line, it's IFS= read -r line, read line does extra processing. Leaving $line unquoted means invoking the split+glob operator. mkdir can take several arguments.
  • Thushi
    Thushi over 9 years
    Oh k.Thank you.I will improve my answer. I just took the above example ;)
  • Thushi
    Thushi over 9 years
    What about $i? Unquoted?? :P
  • Sreeraj
    Sreeraj over 9 years
    But how is it holding a hugelist in memory since there is only one iteration variable. Wouldn't it hold only that one variable during each iteration?
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    @Sree, expanding $(cat...) means reading the output of cat in memory, split+glob it and iterate over the resulting huge list.
  • cuonglm
    cuonglm over 9 years
    In your standard syntax, does it mean POSIX?
  • yorkshiredev
    yorkshiredev over 9 years
    To be honest, since we don't know the entire list of directories, we cannot assume that this answer is correct. A single space in a name will cause the wrong directory structure to be built.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    @John, xargs runs as many instances of the command as needed so as to avoid the limit on the maximum number of arguments. So it will probably invoke many mkdir commands each one of them passed a few thousand of directories to create.
  • Kaizer Sozay
    Kaizer Sozay over 9 years
    there are no spaces. the directory paths are only alpha numeric characters, "../" and the path separator "/"
  • Kaizer Sozay
    Kaizer Sozay over 9 years
    It repeats the error "mkdir: cannot create directory `../myData/data/a/m/e/d': No space left on device" many times for each file ? Could there be a bug in your command ? My file seems to have only unique entries. Or is this just how the error is displayed ?
  • Pryftan
    Pryftan almost 6 years
    @KaizerSozay I know this is old but - the point is that the file could have spaces; and if you're saying that files can't have spaces in them you're wrong (so can directories but directories are a file in the end). They can also have newlines (etc.).
  • Stéphane Chazelas
    Stéphane Chazelas about 5 years
    @KaizerSozay, you're running out of space or inodes, the errors are probably about creating a directory component leading to the files.