Why are hard links to directories not allowed in UNIX/Linux?
Solution 1
This is just a bad idea, as there is no way to tell the difference between a hard link and an original name.
Allowing hard links to directories would break the directed acyclic graph structure of the filesystem, possibly creating directory loops and dangling directory subtrees, which would make fsck
and any other file tree walkers error prone.
First, to understand this, let's talk about inodes. The data in the filesystem is held in blocks on the disk, and those blocks are collected together by an inode. You can think of the inode as THE file. Inodes lack filenames, though. That's where links come in.
A link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode. Opening a file in Unix also creates a link, but it's a different type of link (it's not a named link).
A hard link is just an extra directory entry pointing to that inode. When you ls -l
, the number after the permissions is the named link count. Most regular files will have one link. Creating a new hard link to a file will make both filenames point to the same inode. Note:
% ls -l test
ls: test: No such file or directory
% touch test
% ls -l test
-rw-r--r-- 1 danny staff 0 Oct 13 17:58 test
% ln test test2
% ls -l test*
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
% touch test3
% ls -l test*
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
-rw-r--r-- 1 danny staff 0 Oct 13 17:59 test3
^
^ this is the link count
Now, you can clearly see that there is no such thing as a hard link. A hard link is the same as a regular name. In the above example, test
or test2
, which is the original file and which is the hard link? By the end, you can't really tell (even by timestamps) because both names point to the same contents, the same inode:
% ls -li test*
14445750 -rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
14445750 -rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
14445892 -rw-r--r-- 1 danny staff 0 Oct 13 17:59 test3
The -i
flag to ls
shows you inode numbers in the beginning of the line. Note how test
and test2
have the same inode number,
but test3
has a different one.
Now, if you were allowed to do this for directories, two different directories in different points in the filesystem could point to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.
Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Imagine you are writing the du
command, which needs to recurse through subdirs to find out about disk usage. How would du
know when it hit a loop? It is error prone and a lot of bookkeeping that du
would have to do, just to pull off this simple task.
Symlinks are a whole different beast, in that they are a special type of "file" that many file filesystem APIs tend to automatically follow. Note, a symlink can point to a nonexistent destination, because they point by name, and not directly to an inode. That concept doesn't make sense with hard links, because the mere existence of a "hard link" means the file exists.
So why can du
deal with symlinks easily and not hard links? We were able to see above that hard links are indistinguishable from normal directory entries. Symlinks, however, are special, detectable, and skippable!
du
notices that the symlink is a symlink, and skips it completely!
% ls -l
total 4
drwxr-xr-x 3 danny staff 102 Oct 13 18:14 test1/
lrwxr-xr-x 1 danny staff 5 Oct 13 18:13 test2@ -> test1
% du -ah
242M ./test1/bigfile
242M ./test1
4.0K ./test2
242M .
Solution 2
With the exception of mount points, each directory has one and only parent: ..
.
One way to do pwd
is to check the device:inode for '.' and '..'. If they are the same, you have reached the root of the file system. Otherwise, find the name of the current directory in the parent, push that on a stack, and start comparing '../.' with '../..', then '../../.' with '../../..', etc. Once you've hit the root, start popping and printing the names from the stack. This algorithm relies on the fact that each directory has one and only one parent.
If hard links to directories were allowed, which one of the multiple parents should ..
point to? That is one compelling reason why hardlinks to directories are not allowed.
Symlinks to directories don't cause that problem. If a program wants to, it could do an lstat()
on each part of the pathname and detect when a symlink is encountered. The pwd
algorithm will return the true absolute pathname for a target directory. The fact that there is a piece of text somewhere (the symlink) that points to the target directory is pretty much irrelevant. The existence of such a symlink does not create a loop in the graph.
Solution 3
I like to add few more points about this question. Hard links for directories are allowed in linux, but in a restricted way.
One way we can test this is when we list the content of a directory we find two special directories "." and "..". As we know "." points to the same directory and ".." points to the parent directory.
So lets create a directory tree where "a" is the parent directory which has directory "b" as its child.
a
`-- b
Note down the inode of directory "a". And when we do a ls -la
from directory "a" we can see that "." directory also points to the same inode.
797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 a
And here we can find that the directory "a" has three hard links. This is because the inode 797358 has three hardlinks in the name of "." inside "a" directory and name as ".." inside directory "b" and one with name "a" itslef.
$ ls -ali a/
797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 .
$ ls -ali a/b/
797358 drwxr-xr-x 3 mkannan mkannan 4096 Sep 17 19:13 ..
So here we can understand that hardlinks are there for directories only to connect with their parent and child directories. And so a directory without a child will only have 2 hardlink, and so directory "b" will have only two hardlink.
One reason why hard linking of directories freely were prevented would be to avoid infinite reference loops which will confuse programs which traverse filesystem.
As filesystem is organised as tree and as tree cannot have cyclic reference this should have been avoided.
Solution 4
None of the following are the real reason for disallowing hard links to directories; each problem is fairly easy to solve:
- cycles in the tree structure cause difficult traversal
- multiple parents, so which is the "real" one ?
- filesystem garbage collection
The real reason (as hinted by @Thorbjørn Ravn Andersen)
comes when you delete a directory which has multiple parents, from the directory pointed to by ..
:
What should ..
now point to ?
If the directory is deleted from its parent but its link count is still greater
than 0
then there must be something,
somewhere still pointing to it. You can't leave ..
pointing to nothing;
lots of programs rely on ..
, so the system would have to traverse the entire
file system until it finds the first thing that points to the deleted
directory, just to update ..
. Either that, or the file system would
have to maintain a list of all directories pointing to a hard linked directory.
Either way, this would be a performance overhead and an extra complication for the file system meta data and/or code, so the designers decided not to allow it.
Solution 5
Hardlink creation on directories would be unrevertable. Suppose we have :
/dir1
├──this.txt
├──directory
│ └──subfiles
└──etc
I hardlink it to /dir2
.
So /dir2
now also contains all these files and directories
What if I change my mind? I can't just rmdir /dir2
(because it is non empty)
And if I recursively deletes in /dir2
... it will be deleted from /dir1
too!
IMHO it's a largely sufficient reason to avoid this!
Edit :
Comments suggest removing the directory by doing rm
on it. But rm
on a non-empty directory fails, and this behaviour must remain, whether the directory is hardlinked or not. So you can't just rm
it to unlink. It would require a new argument to rm
, just to say "if the directory inode has a reference count > 1, then only unlink the directory".
Which, in turns, break another principle of least surprise : it means that removal of a directory hardlink I just created is not the same as removal of a normal file hardlink...
I will rephrase my sentence : Without further development, hardlink creation would be unrevertable (as no current command could handle the removal without being incoherent with current behaviour)
If we allow more development to handle the case, the number of pitfalls, and the risk of data loss if you're not enough aware of how the system works, such a development implies, is IMHO a sufficient reason to restrict hardlinking on directories.
Related videos on Youtube
user3539
Updated on September 18, 2022Comments
-
user3539 almost 2 years
I read in text books that Unix/Linux doesn't allow hard links to directories but does allow soft links. Is it because, when we have cycles and if we create hard links, and after some time we delete the original file, it will point to some garbage value?
If cycles were the sole reason behind not allowing hard links, then why are soft links to directories allowed?
-
Thorbjørn Ravn Andersen about 11 yearsWhere should
..
point to? Especially after removing the hard link to this directory, in the directory pointed to by..
? It needs to point somewhere. -
Trevor Boyd Smith over 7 yearsI like this explanation. Concise and easy to read and/or skim.
-
-
user3539 over 12 years
Allowing hard links to directories would break the directed acyclic graph structure of the filesystem
. Can you please explain more about the problem with cycles using hard links? Why is it ok with symlinks -
Virendra over 12 years@user3539, I have updated the answer with more explanation.
-
psusi over 12 yearsThey seem to have allowed it on Macs by adding cycle detection to the link() system call, and refusing to allow you to create a directory hard link if it would create a cycle. Seems to be a reasonable solution.
-
Virendra over 12 years@psusi mkdir -p a/b; nocheckln c a; mv c a/b; -- the nocheckln there is a theoretical ln that doesnt check for directory args, and just passes to link, and because no cycle is made, we are all good in creating 'c'. then we move 'c' into 'a/b', and a cycle is created from a/b/c -> a/ -- checking in link() is not good enough
-
psusi over 12 years@DannyDulai, yea, I guess
rename()
needs to check as well... -
imz -- Ivan Zakharyaschev over 12 years@psusi: for example, here is some discussion of the need to implement cycle detection for rename() (w.r.t. the planned features of reiserfs): "Cycle may consists of more graph nodes than fits into memory. Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO."
-
RMuesi over 11 years+1. A corollary is that the kernel would have to deal with loops of directories that are not linked by the root. stackoverflow.com/a/7720649/778990
-
doug65536 about 11 yearsCycles are very bad. Windows has this problem with "junctions" which are hard link directories. If you accidentally apply permissions to your whole profile, it uncovers a series of junctions that create an infinite cycle. Recursing through the directories recurses until path length limitations stop it.
-
Behrooz over 10 years+1 for the good explanation, but you left me with a hole in my entire knowledge of computer programming. does it take more than one stack per mount-point to track cycles?
-
vonbrand over 10 yearsPlease clarify the answer, hard link names haven't got timestamps (yes, you can infer from the directory timestamps when the links where created, if they were the last operation on the directory, but that is quite outlandish...)
-
Benubird over 10 yearsNot so sure about this. If we think of
..
as being a sort of virtual hardlink to the parent, there is no technical reason that the target of the link can only have one other link to it.pwd
would just have to use a different algorithm to resolve the path. -
Kannan Mohan almost 10 yearsThat should not be a problem. With your case, when we create hardlink to dir2 we have to make hardlink to all the contents in dir1 and so if we rename or delete dir2 only an extra link to the inode gets deleted. And that should not affect dir1 and its content as there is atleast one link (dir1) to the inode.
-
Kannan Mohan over 9 yearsAs we have a limited way of allowing hard links for directories i.e ".." and "." we will not reach a infinite loop and so we would not require any special ways to avoid those as they will not happen :)
-
Lqueryvg over 9 years
du
is not a good reason as to why hard links to dirs is not allowed.du
already has to do lots of "book keeping" by keeping track of inodes, to make sure it doesn't count blocks twice. -
Virendra over 9 yearsdu was just an example, I'm sure you can use your imagination to extrapolate.
-
jathd over 9 yearsThat's easy to solve as well: keep a list of parents of a child directory, which you update when you add or remove a link to the child. When you delete the canonical parent (the target of the child's
..
), update..
to point to one of the other parents in the list. -
Lqueryvg over 9 yearsI agree. Not rocket science to solve. But nonetheless a performance overhead, and it would take up a little bit extra space in the file system meta data and add complication. And so the designers went for the simple, fast approach - don't allow links to hard directories.
-
psusi over 8 years@WhiteWinterWolf, I've never used macs myself but read they have the ability somewhere. Years later I now have no idea where. You might need a force switch or something.
-
psusi over 8 years@WhiteWinterWolf, according to this link, they specifically added support for it for time machine, but only root is allowed to do it: superuser.com/questions/360926/…
-
WhiteWinterWolf over 8 years@psusi: Funny, the quoted manpage does not come from OSX but from Linux. This flag forces
ln
to pass the parameters to thelinkat()
system function in the hope that the request will not be rejected by lower layers (as shown bystrace
). Interesting, despite what the manpage states this flag works the same way for root and unprivileged users (tested on Fedora and Debian). -
WhiteWinterWolf over 8 yearsAnd, while not an authoritative source, the last paragraph of this page is relevant for OSX, a change seems to have been made at the file system level specifically for Time Machine starting from OSX Leopard, chances are that it use some different call than the standard
link()
call and, in all case, this feature is not meant to be provided to the end-user and is not supported byln
. Thanks @psusi :) ! -
psusi over 8 years@WhiteWinterWolf, of course the utility works the same way for root as for other users: it is the kernel that checks permissions, not the utility. Also you seem to have linked a rather old Ubuntu man page that says nothing about OSX.
-
WhiteWinterWolf over 8 years@psusi: Wrong URL, sorry (too sad we cannot correct the comments). Here is the link relevant regarding OSX Time Machine.
-
ctrl-alt-delor over 8 yearsI think, though I may be wrong, that
.
and..
are not hardlinks in file-system, for modern file-systems. However the file-system driver fakes them. It is these file-system that stop ups hard linking directories. For old file-systems it was possible (but dangerous). To do what you are trying, look atmount --bind
, see alsomount --make…
and maybe containers. -
Lqueryvg about 8 yearsSym links to dirs "violate settled semantics and behaviours", yet they are still allowed. Some commands therefore need options to control whether sym links are followed (e.g. -L in find and cp). When a program follows '..' there is further confusion, hence the difference in output from pwd and /bin/pwd after traversing a sym link. There are no "Unix answers"; just design decisions. This one revolves around what becomes of ".." as I stated in my answer. Unfortunately, '..' isn't even mentioned in the answer that everyone else is so sheepishly voting for.
-
Lqueryvg about 8 yearsBTW, I'm not saying I'm in favour of hard links to dirs. Not at all. I don't want my day job to be harder than it is already.
-
Lqueryvg about 8 years@DannyDulai, you write as though
du
cannot cope with hard links and of the "bookkeeping" it "would have to do, just to pull off this simple task". The truth is thatdu
already copes with hard links and has done so for many years quite easily by keeping track of the inodes it's already visited. Also,du
does not skip symlinks "completely" as you say; they have an inode and a size and have to be counted. The onus should be on you to come up with a better example thandu
, not for the reader to "imagine" one, because what you've written misrepresents howdu
really works. -
Virendra about 8 years@Lqueryvg -- "imagine you are writing the du command" does not mean you are tasked with writing the modern full featured du. It is a thought exercise meant to make you think about writing a command that iterates through directories and files. As for skipping, the skipping is in the context of drilling deeper into the directory hierarchy, not absolute skipping. You are picking at details of English, and not taking the text in context.
-
Lqueryvg about 8 years@DannyDulai, rest assured I understand the context. But, "the du command" versus "a command like du"? And "skips it completely" versus "doesn't drill deeper"? Is that just picking at English ? I don't think so. I can suggest some fairly minimal edits which would clear all of this up if you want.
-
Virendra about 8 years@Lqueryvg : go for it.. I welcome the edits :-)
-
Lqueryvg about 8 years@DannyDulai, well I tried but my edits were rejected; not by you. Considering the nature of my edits (minor clarification only), I'm amazed.
-
LtWorf about 7 yearsYour argument is incorrect. You would just unlink it, not do rm -rf. And if the link count reaches 0, then the system would know it can delete all the contents too.
-
ybungalobill over 6 yearsIt's not what POSIX says, but IMO '..' should have never been a filesystem concept, rather resolved syntactically on the paths, so that
a/..
would always mean.
. This is how URLs work, btw. It's the browser that's resolving '..' before it even hits the server. And it works great. -
gagarine over 5 years@psusi hard link on directory are not allowed on MacOS new filesystem APFS.
-
BryKKan almost 5 yearsThat's more or less all
rm
does underneath anyway (unlink). See: unix.stackexchange.com/questions/151951/… This really isn't an issue, any more than it is with hardlinked files. Unlinking just removes the named reference and decrements the link count. The fact thatrmdir
won't delete non-empty directories is irrelevant - it wouldn't do that fordir1
either. Hardlinks aren't copies of data, they are the same actual file, hence actually "deleting" the dir2 file would erase the directory listing for dir1. You would always need to unlink. -
Pierre-Olivier Vares almost 5 yearsYou can't just unlink it like a normal file, because
rm
on a directory don't unlink it if it's non empty. See Edit. -
Matt over 4 yearsActually, the "file tree walkers" could still easily avoid loops when traversing the tree: Only recurse if the subdirectory's ".." points back to the parent. @Lqueryvg's answer seems to get more directly to the real problem: The very idea of "parent directory" would have to be redesigned from the ground up. Much easier to restrict directory hardlinks.
-
Skaperen almost 3 years
..
only needs to refer to the inode of the parent. if the parent has hard links, those are just paths (names in other directories) that refer to the same inode. if..
was a path to the parent, then it would be like a symlink. -
Jim Balter over 2 years"Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). " -- You must keep track of inode numbers because of symlinks, and commands like
du
andfind
do exactly that. -
Jim Balter over 2 years
..
links aren't needed at all and some filesystems don't create them. You can always figure out the parent from the path (prefix the cwd for relative paths). -
Jim Balter over 2 yearsThis is just wrong. If hard links to directories were allowed, then of course the
rmdir
system call would not remove the inode if the link count indicated that there were other links, just as with theunlink
system call. "without being incoherent with current behaviour" -- current behavior is that you can't hardlink to directories. Changing that of course implies thatrmdir
changes accordingly. "IMHO a sufficient reason to restrict hardlinking on directories" --- I have opinions too, but that's not what the question asks for. -
Jim Balter over 2 years@KannanMohan "when we create hardlink to dir2 we have to make hardlink to all the contents in dir1" -- no ... where would these hardlinks to the contents of dir1 go? Hardlinking to a directory gives you another path to its contents; there's no reason to hardlink the contents as well.
-
Jim Balter over 2 years@LtWorf "You would just unlink it, not do rm -rf" -- yes, except that you would use
rmdir
, notunlink
, and thermdir
system call would check the link count and not delete the inode if the directory remained in use, so this answer poses a non-problem. -
Jim Balter over 2 years"Hard-linking directories used to be freely allowed in Bell Labs UNIX, at least V6 and V7" -- only for superusers, who could also
unlink
directories. This was extremely dangerous and was fatal when UNIX started supporting foreign filesystems. I'm pretty sure both of these were disallowed in PWB. "some of the elegance of UNIX was in the implementation" -- much of it was necessitated by the tiny amount of memory on PDP-11s. Branches in shell scripts were done via a seek--"elegant" but bog slow.