Why are hard links to directories not allowed in UNIX/Linux?

73,798

Solution 1

This is just a bad idea, as there is no way to tell the difference between a hard link and an original name.

Allowing hard links to directories would break the directed acyclic graph structure of the filesystem, possibly creating directory loops and dangling directory subtrees, which would make fsck and any other file tree walkers error prone.

First, to understand this, let's talk about inodes. The data in the filesystem is held in blocks on the disk, and those blocks are collected together by an inode. You can think of the inode as THE file.  Inodes lack filenames, though. That's where links come in.

A link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode. Opening a file in Unix also creates a link, but it's a different type of link (it's not a named link).

A hard link is just an extra directory entry pointing to that inode. When you ls -l, the number after the permissions is the named link count. Most regular files will have one link. Creating a new hard link to a file will make both filenames point to the same inode. Note:

% ls -l test
ls: test: No such file or directory
% touch test
% ls -l test
-rw-r--r--  1 danny  staff  0 Oct 13 17:58 test
% ln test test2
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
% touch test3
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
-rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3
            ^
            ^ this is the link count

Now, you can clearly see that there is no such thing as a hard link. A hard link is the same as a regular name. In the above example, test or test2, which is the original file and which is the hard link? By the end, you can't really tell (even by timestamps) because both names point to the same contents, the same inode:

% ls -li test*  
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
14445892 -rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3

The -i flag to ls shows you inode numbers in the beginning of the line. Note how test and test2 have the same inode number, but test3 has a different one.

Now, if you were allowed to do this for directories, two different directories in different points in the filesystem could point to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.

Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Imagine you are writing the du command, which needs to recurse through subdirs to find out about disk usage. How would du know when it hit a loop? It is error prone and a lot of bookkeeping that du would have to do, just to pull off this simple task.

Symlinks are a whole different beast, in that they are a special type of "file" that many file filesystem APIs tend to automatically follow. Note, a symlink can point to a nonexistent destination, because they point by name, and not directly to an inode. That concept doesn't make sense with hard links, because the mere existence of a "hard link" means the file exists.

So why can du deal with symlinks easily and not hard links? We were able to see above that hard links are indistinguishable from normal directory entries. Symlinks, however, are special, detectable, and skippable!  du notices that the symlink is a symlink, and skips it completely!

% ls -l 
total 4
drwxr-xr-x  3 danny  staff  102 Oct 13 18:14 test1/
lrwxr-xr-x  1 danny  staff    5 Oct 13 18:13 test2@ -> test1
% du -ah
242M    ./test1/bigfile
242M    ./test1
4.0K    ./test2
242M    .

Solution 2

With the exception of mount points, each directory has one and only parent: ...

One way to do pwd is to check the device:inode for '.' and '..'. If they are the same, you have reached the root of the file system. Otherwise, find the name of the current directory in the parent, push that on a stack, and start comparing '../.' with '../..', then '../../.' with '../../..', etc. Once you've hit the root, start popping and printing the names from the stack. This algorithm relies on the fact that each directory has one and only one parent.

If hard links to directories were allowed, which one of the multiple parents should .. point to? That is one compelling reason why hardlinks to directories are not allowed.

Symlinks to directories don't cause that problem. If a program wants to, it could do an lstat() on each part of the pathname and detect when a symlink is encountered. The pwd algorithm will return the true absolute pathname for a target directory. The fact that there is a piece of text somewhere (the symlink) that points to the target directory is pretty much irrelevant. The existence of such a symlink does not create a loop in the graph.

Solution 3

I like to add few more points about this question. Hard links for directories are allowed in linux, but in a restricted way.

One way we can test this is when we list the content of a directory we find two special directories "." and "..". As we know "." points to the same directory and ".." points to the parent directory.

So lets create a directory tree where "a" is the parent directory which has directory "b" as its child.

 a
 `-- b

Note down the inode of directory "a". And when we do a ls -la from directory "a" we can see that "." directory also points to the same inode.

797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 a

And here we can find that the directory "a" has three hard links. This is because the inode 797358 has three hardlinks in the name of "." inside "a" directory and name as ".." inside directory "b" and one with name "a" itslef.

$ ls -ali a/
797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 .

$ ls -ali a/b/
797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 ..

So here we can understand that hardlinks are there for directories only to connect with their parent and child directories. And so a directory without a child will only have 2 hardlink, and so directory "b" will have only two hardlink.

One reason why hard linking of directories freely were prevented would be to avoid infinite reference loops which will confuse programs which traverse filesystem.

As filesystem is organised as tree and as tree cannot have cyclic reference this should have been avoided.

Solution 4

None of the following are the real reason for disallowing hard links to directories; each problem is fairly easy to solve:

  • cycles in the tree structure cause difficult traversal
  • multiple parents, so which is the "real" one ?
  • filesystem garbage collection

The real reason (as hinted by @Thorbjørn Ravn Andersen) comes when you delete a directory which has multiple parents, from the directory pointed to by ..:

What should .. now point to ?

If the directory is deleted from its parent but its link count is still greater than 0 then there must be something, somewhere still pointing to it. You can't leave .. pointing to nothing; lots of programs rely on .., so the system would have to traverse the entire file system until it finds the first thing that points to the deleted directory, just to update ... Either that, or the file system would have to maintain a list of all directories pointing to a hard linked directory.

Either way, this would be a performance overhead and an extra complication for the file system meta data and/or code, so the designers decided not to allow it.

Solution 5

Hardlink creation on directories would be unrevertable. Suppose we have :

/dir1
├──this.txt
├──directory
│  └──subfiles
└──etc

I hardlink it to /dir2.

So /dir2 now also contains all these files and directories

What if I change my mind? I can't just rmdir /dir2 (because it is non empty)

And if I recursively deletes in /dir2... it will be deleted from /dir1 too!

IMHO it's a largely sufficient reason to avoid this!

Edit :

Comments suggest removing the directory by doing rm on it. But rm on a non-empty directory fails, and this behaviour must remain, whether the directory is hardlinked or not. So you can't just rm it to unlink. It would require a new argument to rm, just to say "if the directory inode has a reference count > 1, then only unlink the directory".

Which, in turns, break another principle of least surprise : it means that removal of a directory hardlink I just created is not the same as removal of a normal file hardlink...

I will rephrase my sentence : Without further development, hardlink creation would be unrevertable (as no current command could handle the removal without being incoherent with current behaviour)

If we allow more development to handle the case, the number of pitfalls, and the risk of data loss if you're not enough aware of how the system works, such a development implies, is IMHO a sufficient reason to restrict hardlinking on directories.

Share:
73,798

Related videos on Youtube

user3539
Author by

user3539

Updated on September 18, 2022

Comments

  • user3539
    user3539 almost 2 years

    I read in text books that Unix/Linux doesn't allow hard links to directories but does allow soft links. Is it because, when we have cycles and if we create hard links, and after some time we delete the original file, it will point to some garbage value?

    If cycles were the sole reason behind not allowing hard links, then why are soft links to directories allowed?

    • Thorbjørn Ravn Andersen
      Thorbjørn Ravn Andersen about 11 years
      Where should .. point to? Especially after removing the hard link to this directory, in the directory pointed to by ..? It needs to point somewhere.
    • Trevor Boyd Smith
      Trevor Boyd Smith over 7 years
      I like this explanation. Concise and easy to read and/or skim.
  • user3539
    user3539 over 12 years
    Allowing hard links to directories would break the directed acyclic graph structure of the filesystem. Can you please explain more about the problem with cycles using hard links? Why is it ok with symlinks
  • Virendra
    Virendra over 12 years
    @user3539, I have updated the answer with more explanation.
  • psusi
    psusi over 12 years
    They seem to have allowed it on Macs by adding cycle detection to the link() system call, and refusing to allow you to create a directory hard link if it would create a cycle. Seems to be a reasonable solution.
  • Virendra
    Virendra over 12 years
    @psusi mkdir -p a/b; nocheckln c a; mv c a/b; -- the nocheckln there is a theoretical ln that doesnt check for directory args, and just passes to link, and because no cycle is made, we are all good in creating 'c'. then we move 'c' into 'a/b', and a cycle is created from a/b/c -> a/ -- checking in link() is not good enough
  • psusi
    psusi over 12 years
    @DannyDulai, yea, I guess rename() needs to check as well...
  • imz -- Ivan Zakharyaschev
    imz -- Ivan Zakharyaschev over 12 years
    @psusi: for example, here is some discussion of the need to implement cycle detection for rename() (w.r.t. the planned features of reiserfs): "Cycle may consists of more graph nodes than fits into memory. Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO."
  • RMuesi
    RMuesi over 11 years
    +1. A corollary is that the kernel would have to deal with loops of directories that are not linked by the root. stackoverflow.com/a/7720649/778990
  • doug65536
    doug65536 about 11 years
    Cycles are very bad. Windows has this problem with "junctions" which are hard link directories. If you accidentally apply permissions to your whole profile, it uncovers a series of junctions that create an infinite cycle. Recursing through the directories recurses until path length limitations stop it.
  • Behrooz
    Behrooz over 10 years
    +1 for the good explanation, but you left me with a hole in my entire knowledge of computer programming. does it take more than one stack per mount-point to track cycles?
  • vonbrand
    vonbrand over 10 years
    Please clarify the answer, hard link names haven't got timestamps (yes, you can infer from the directory timestamps when the links where created, if they were the last operation on the directory, but that is quite outlandish...)
  • Benubird
    Benubird over 10 years
    Not so sure about this. If we think of .. as being a sort of virtual hardlink to the parent, there is no technical reason that the target of the link can only have one other link to it. pwd would just have to use a different algorithm to resolve the path.
  • Kannan Mohan
    Kannan Mohan almost 10 years
    That should not be a problem. With your case, when we create hardlink to dir2 we have to make hardlink to all the contents in dir1 and so if we rename or delete dir2 only an extra link to the inode gets deleted. And that should not affect dir1 and its content as there is atleast one link (dir1) to the inode.
  • Kannan Mohan
    Kannan Mohan over 9 years
    As we have a limited way of allowing hard links for directories i.e ".." and "." we will not reach a infinite loop and so we would not require any special ways to avoid those as they will not happen :)
  • Lqueryvg
    Lqueryvg over 9 years
    du is not a good reason as to why hard links to dirs is not allowed. du already has to do lots of "book keeping" by keeping track of inodes, to make sure it doesn't count blocks twice.
  • Virendra
    Virendra over 9 years
    du was just an example, I'm sure you can use your imagination to extrapolate.
  • jathd
    jathd over 9 years
    That's easy to solve as well: keep a list of parents of a child directory, which you update when you add or remove a link to the child. When you delete the canonical parent (the target of the child's ..), update .. to point to one of the other parents in the list.
  • Lqueryvg
    Lqueryvg over 9 years
    I agree. Not rocket science to solve. But nonetheless a performance overhead, and it would take up a little bit extra space in the file system meta data and add complication. And so the designers went for the simple, fast approach - don't allow links to hard directories.
  • psusi
    psusi over 8 years
    @WhiteWinterWolf, I've never used macs myself but read they have the ability somewhere. Years later I now have no idea where. You might need a force switch or something.
  • psusi
    psusi over 8 years
    @WhiteWinterWolf, according to this link, they specifically added support for it for time machine, but only root is allowed to do it: superuser.com/questions/360926/…
  • WhiteWinterWolf
    WhiteWinterWolf over 8 years
    @psusi: Funny, the quoted manpage does not come from OSX but from Linux. This flag forces ln to pass the parameters to the linkat() system function in the hope that the request will not be rejected by lower layers (as shown by strace). Interesting, despite what the manpage states this flag works the same way for root and unprivileged users (tested on Fedora and Debian).
  • WhiteWinterWolf
    WhiteWinterWolf over 8 years
    And, while not an authoritative source, the last paragraph of this page is relevant for OSX, a change seems to have been made at the file system level specifically for Time Machine starting from OSX Leopard, chances are that it use some different call than the standard link() call and, in all case, this feature is not meant to be provided to the end-user and is not supported by ln. Thanks @psusi :) !
  • psusi
    psusi over 8 years
    @WhiteWinterWolf, of course the utility works the same way for root as for other users: it is the kernel that checks permissions, not the utility. Also you seem to have linked a rather old Ubuntu man page that says nothing about OSX.
  • WhiteWinterWolf
    WhiteWinterWolf over 8 years
    @psusi: Wrong URL, sorry (too sad we cannot correct the comments). Here is the link relevant regarding OSX Time Machine.
  • ctrl-alt-delor
    ctrl-alt-delor over 8 years
    I think, though I may be wrong, that . and .. are not hardlinks in file-system, for modern file-systems. However the file-system driver fakes them. It is these file-system that stop ups hard linking directories. For old file-systems it was possible (but dangerous). To do what you are trying, look at mount --bind, see also mount --make… and maybe containers.
  • Lqueryvg
    Lqueryvg about 8 years
    Sym links to dirs "violate settled semantics and behaviours", yet they are still allowed. Some commands therefore need options to control whether sym links are followed (e.g. -L in find and cp). When a program follows '..' there is further confusion, hence the difference in output from pwd and /bin/pwd after traversing a sym link. There are no "Unix answers"; just design decisions. This one revolves around what becomes of ".." as I stated in my answer. Unfortunately, '..' isn't even mentioned in the answer that everyone else is so sheepishly voting for.
  • Lqueryvg
    Lqueryvg about 8 years
    BTW, I'm not saying I'm in favour of hard links to dirs. Not at all. I don't want my day job to be harder than it is already.
  • Lqueryvg
    Lqueryvg about 8 years
    @DannyDulai, you write as though du cannot cope with hard links and of the "bookkeeping" it "would have to do, just to pull off this simple task". The truth is that du already copes with hard links and has done so for many years quite easily by keeping track of the inodes it's already visited. Also, du does not skip symlinks "completely" as you say; they have an inode and a size and have to be counted. The onus should be on you to come up with a better example than du, not for the reader to "imagine" one, because what you've written misrepresents how du really works.
  • Virendra
    Virendra about 8 years
    @Lqueryvg -- "imagine you are writing the du command" does not mean you are tasked with writing the modern full featured du. It is a thought exercise meant to make you think about writing a command that iterates through directories and files. As for skipping, the skipping is in the context of drilling deeper into the directory hierarchy, not absolute skipping. You are picking at details of English, and not taking the text in context.
  • Lqueryvg
    Lqueryvg about 8 years
    @DannyDulai, rest assured I understand the context. But, "the du command" versus "a command like du"? And "skips it completely" versus "doesn't drill deeper"? Is that just picking at English ? I don't think so. I can suggest some fairly minimal edits which would clear all of this up if you want.
  • Virendra
    Virendra about 8 years
    @Lqueryvg : go for it.. I welcome the edits :-)
  • Lqueryvg
    Lqueryvg about 8 years
    @DannyDulai, well I tried but my edits were rejected; not by you. Considering the nature of my edits (minor clarification only), I'm amazed.
  • LtWorf
    LtWorf about 7 years
    Your argument is incorrect. You would just unlink it, not do rm -rf. And if the link count reaches 0, then the system would know it can delete all the contents too.
  • ybungalobill
    ybungalobill over 6 years
    It's not what POSIX says, but IMO '..' should have never been a filesystem concept, rather resolved syntactically on the paths, so that a/.. would always mean .. This is how URLs work, btw. It's the browser that's resolving '..' before it even hits the server. And it works great.
  • gagarine
    gagarine over 5 years
    @psusi hard link on directory are not allowed on MacOS new filesystem APFS.
  • BryKKan
    BryKKan almost 5 years
    That's more or less all rm does underneath anyway (unlink). See: unix.stackexchange.com/questions/151951/… This really isn't an issue, any more than it is with hardlinked files. Unlinking just removes the named reference and decrements the link count. The fact that rmdir won't delete non-empty directories is irrelevant - it wouldn't do that for dir1 either. Hardlinks aren't copies of data, they are the same actual file, hence actually "deleting" the dir2 file would erase the directory listing for dir1. You would always need to unlink.
  • Pierre-Olivier Vares
    Pierre-Olivier Vares almost 5 years
    You can't just unlink it like a normal file, because rm on a directory don't unlink it if it's non empty. See Edit.
  • Matt
    Matt over 4 years
    Actually, the "file tree walkers" could still easily avoid loops when traversing the tree: Only recurse if the subdirectory's ".." points back to the parent. @Lqueryvg's answer seems to get more directly to the real problem: The very idea of "parent directory" would have to be redesigned from the ground up. Much easier to restrict directory hardlinks.
  • Skaperen
    Skaperen almost 3 years
    .. only needs to refer to the inode of the parent. if the parent has hard links, those are just paths (names in other directories) that refer to the same inode. if .. was a path to the parent, then it would be like a symlink.
  • Jim Balter
    Jim Balter over 2 years
    "Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). " -- You must keep track of inode numbers because of symlinks, and commands like du and find do exactly that.
  • Jim Balter
    Jim Balter over 2 years
    .. links aren't needed at all and some filesystems don't create them. You can always figure out the parent from the path (prefix the cwd for relative paths).
  • Jim Balter
    Jim Balter over 2 years
    This is just wrong. If hard links to directories were allowed, then of course the rmdir system call would not remove the inode if the link count indicated that there were other links, just as with the unlink system call. "without being incoherent with current behaviour" -- current behavior is that you can't hardlink to directories. Changing that of course implies that rmdir changes accordingly. "IMHO a sufficient reason to restrict hardlinking on directories" --- I have opinions too, but that's not what the question asks for.
  • Jim Balter
    Jim Balter over 2 years
    @KannanMohan "when we create hardlink to dir2 we have to make hardlink to all the contents in dir1" -- no ... where would these hardlinks to the contents of dir1 go? Hardlinking to a directory gives you another path to its contents; there's no reason to hardlink the contents as well.
  • Jim Balter
    Jim Balter over 2 years
    @LtWorf "You would just unlink it, not do rm -rf" -- yes, except that you would use rmdir, not unlink, and the rmdir system call would check the link count and not delete the inode if the directory remained in use, so this answer poses a non-problem.
  • Jim Balter
    Jim Balter over 2 years
    "Hard-linking directories used to be freely allowed in Bell Labs UNIX, at least V6 and V7" -- only for superusers, who could also unlink directories. This was extremely dangerous and was fatal when UNIX started supporting foreign filesystems. I'm pretty sure both of these were disallowed in PWB. "some of the elegance of UNIX was in the implementation" -- much of it was necessitated by the tiny amount of memory on PDP-11s. Branches in shell scripts were done via a seek--"elegant" but bog slow.