Why do hard links seem to take the same space as the originals?

1,285

Solution 1

A file is an inode with meta data among which a list of pointers to where to find the data.

In order to be able to access a file, you have to link it to a directory (think of directories as phone directories, not folders), that is add one or more entries to one of more directories to associate a name with that file.

All those links, those file names point to the same file. There's not one that is the original and the other ones that are links. They are all access points to the same file (same inode) in the directory tree. When you get the size of the file (lstat system call), you're retrieving information (that metadata referred to above) stored in the inode, it doesn't matter which file name, which link you're using to refer to that file.

By contrast symlinks are another file (another inode) whose content is a path to the target file. Like any other file, those symlinks have to be linked to a directory (must have a name) so you can access them. You can also have several links to a symlinks, or in other words, symlinks can be given several names (in one or more directories).

$ touch a
$ ln a b
$ ln -s a c
$ ln c d
$ ls -li [a-d]
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 a
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 b
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 c -> a
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 d -> a

Above the file number 10486707 is a regular file. Two entries in the current directory (one with name a, one with name b) link to it. Because the link count is 2, we know there's no other name of that file in the current directory or any other directory. File number 10502404 is another file, this time of type symlink linked twice to the current directory. Its content (target) is the relative path "a".

Note that if 10502404 was linked to another directory than the current one, it would typically point to a different file depending on how it was accessed.

$ mkdir 1 2
$ echo foo > 1/a
$ echo bar > 2/a
$ ln -s a 1/b
$ ln 1/b 2/b
$ ls -lia 1 2
1:
total 92
10608644 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10504186 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a

2:
total 92
10608674 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10539044 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a
$ cat 1/b
foo
$ cat 2/b
bar

Files have no names associated with them other than in the directories that link them. The space taken by their names is the entries in those directories, it's accounted for in the file size/disk usage of the directories.

You'll notice that the system call to remove a file is unlink. That is, you don't remove files, you unlink them from the directories they're referenced in. Once unlinked from the last directory that had an entry to a given file, that file is then destroyed (as long as no process has it opened).

Solution 2

The hard link is, essentially, the original file. So, the size you see reported is the size of the file being linked to. It is soft links that only take up the space of their names (kinda).

As far as the filesystem is concerned, the hard link and the original are the same thing, they point to the same inode so the same size is reported.

Share:
1,285

Related videos on Youtube

NDBoost
Author by

NDBoost

Updated on September 18, 2022

Comments

  • NDBoost
    NDBoost over 1 year

    I have all the prereq's installed.. I am on osx lion 10.7.2

    xcode:

    $ xcodebuild -version
    Xcode 4.2.1
    

    git:

    $ git --version
    git version 1.7.5.4
    

    when i run $ bash < <( curl -s https://rvm.beginrescueend.com/install/rvm )

    i get the following error:

    curl: (6) Could not resolve host: HD; nodename nor servname provided, or not known Could not download 'https://github.com/wayneeseguin/rvm/tarball/master'.

    any ideas why? If i run as sudo it goes through but then i get more errors... Need this to install as single user. path to my home dir is:

    '/volumes/Macintosh HD/users/mikedevita'
    
    • NDBoost
      NDBoost over 12 years
      i have a feeling that this is because my hdd name is Macintosh HD...
    • Paul Simpson
      Paul Simpson over 12 years
      I think you're right. If you check out the script - it's failing because curl is choking on your $rvm_path, which includes your home directoy. It can't handle the space in the path to your home directory (and consequently $rvm_path
    • NDBoost
      NDBoost over 12 years
      yup, got rvm installed.. but the GUI jewelrybox is force closing every time it opens..
    • NDBoost
      NDBoost over 12 years
      i down graded to 1.1.2 and JewelryBox is loading now.
  • Ievgen Chuchukalo
    Ievgen Chuchukalo over 10 years
    But the hard link's name must take space, correct?
  • terdon
    terdon over 10 years
    See @stephan's answer below, he explains it better.
  • Ievgen Chuchukalo
    Ievgen Chuchukalo over 10 years
    Ahh... Now I see. So a file called "hi" and its exact copy called "ajhĝjdmjefsjmksgskgjkmŝŭna" take exactly the same ammout of space; because their names don't count for that lstat system call that gets their size.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    @JMCF125, yes the size taken by their names is the entry in the corresponding directories, it's accounted in the file size of the directories.
  • Ievgen Chuchukalo
    Ievgen Chuchukalo over 10 years
    Thanks. Can you include that in your answer? Wait, I'll clarify my question first.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 10 years
    @JMCF125 Yes, but that space is inside the directory. If you create enough files, you'll notice that the directory sizes increases. The size of a file doesn't include its metadata such as its name.
  • Ievgen Chuchukalo
    Ievgen Chuchukalo over 10 years
    @Gilles, thanks, but @Stephane has already updated his answer with that information. Also, now I think of it better, the name of / must be stored in itself, as if you do cd .. in / you stay in /.