How is a directory a "special type of file"?

9,599

Solution 1

Many entities in *nix style (and other) operating systems are considered files, or have a defining file-like aspect, even though they are not necessarily a sequence of bytes stored in a filesystem. Exactly how directories are implemented depends on the kind of filesystem, but generally what they contain, considered as a list, is a sequence of stored bytes, so in that sense they are not that special.

One way of defining what a "file" is in a *nix context is that it is something which has a file descriptor associated with it. As per the wikipedia article, a file descriptor

is an abstract indicator used to access a file or other input/output resource, such as a pipe or network connection...

In other words, they refer to various kinds of resources from/to which a sequence of bytes may be read/written, although the source/destination of that sequence is unspecified. Put another way, the "where" of the resource could be anything. What defines it is that it is a conduit of information. This is part of why it is sometimes said that in unix "everything is a file". You should not take that completely literally, but it is worth serious consideration. In the case of a directory, this information pertains to what is in the directory, and on a lower, implementation level, how to find it within the filesystem.

Directories are sort of special in this sense because in native C code they are not ostensibly associated with a file descriptor; the POSIX API uses a special type of stream handle, DIR*. However, this type does in fact have an underlying descriptor which can be retrieved. Descriptors are managed by the kernel and accessing them always involves system calls, hence, another aspect of what a descriptor is is that it is a conduit controlled by the OS kernel. They have unique (per process) numbers starting with 0, which is usually the descriptor for the standard input stream.

Solution 2

In the Unix Way of Doing Things: everything is a file.

A directory is one (of many) type of special file. It doesn't contain data. Instead, it contains pointers to all of the files that are contained within the directory.

Other types of special files:

  • links
  • sockets
  • devices

But because they are considered "files", you can ls them and rename them and move them and, depending on the type of special file, send data to/from them.

Solution 3

My answer is mere reminiscence, but in 199x vintage Unixes, of which there were many, directories were files, just marked "directory" somewhere in the on-disk inode.

You could open a directory with something like open(".", O_RDONLY) and get back a usable file descriptor. You could parse the contents if you scrounged through /usr/include and found the correct C struct definition. I know that I did this for SunOS 4.1.x systems, SGI's EFS filesystem, and whatever DEC's Mips-CPU workstations had for a filesystem, probably BSD4.2 FFS.

That was a bad experience. Standardizing on a virtual filesystem layer is a good thing for portability, even if directories are no longer strict files. VFS layers let us experiment with filesystems where directories aren't files, like ReiserFS, or NFS.

Solution 4

A directory is special in that it has the 'd' in its mode, telling the filesystem that it should interpret its contents as a list of other files contained within the directory, rather than a regular file that is just a sequence of bytes to be read by the application. That is all.

Solution 5

Directories are files because linux systems employ universal i/o model. In the model everything in the system is a file and it can be accessed with same system calls and various commands.

They are of special type because their i-nodes have the mark for the file type and they have a special structure of being a table of filenames and links to other i-nodes. These filename-link pairs, also known as "hardlinks", in a directory's i-node enumerate the files "inside" the directory.

Directories are just for organising files. When a file is "moved" from a directory to another one, the file itself do not relocate in the disk. It's just that an entry in one directory i-nodes is removed and written in another directory i-node.

Share:
9,599

Related videos on Youtube

jds
Author by

jds

Updated on September 18, 2022

Comments

  • jds
    jds over 1 year

    I am reading this Unix tutorial and came across this quote...

    We should note here that a directory is merely a special type of file.

    ...but no explanation or details are provided. How is a directory really just a file?

  • gbarry
    gbarry about 9 years
    And this makes life much easier, because you don't have to do something differently just because it's a directory. This applies to writing programs as well as operations from the command line (or GUI).
  • Scott - Слава Україні
    Scott - Слава Україні about 9 years
  • zwol
    zwol about 9 years
    POSIX.1-2008 added a bunch of system calls (openat, fstatat, etc) which use file descriptors referring to directories.
  • zwol
    zwol about 9 years
    Things are not so simple with all filesystems -- for instance, in Apple's HFS+ there's just one big B+tree containing all pathnames, if I remember correctly -- but this observation is spot on for Unix filesystems up to and including BSD's ffs, which is probably what the authors of the cited tutorial were thinking of.
  • Kevin
    Kevin about 9 years
    Even more interestingly, you can fsync() a read-only (!) directory fd, and it has a well-defined effect (specifically, it syncs file creation/renaming/deletion in the given directory to disk, a theoretically necessary step in the "write to a temporary file and rename it over the original" idiom).
  • jamesqf
    jamesqf about 9 years
    A directory does contain data: the data that describes the files contained in the directory. It's perfectly possible to access a directory (though perhaps not with a standard open call) and read that data yourself, though (as Bruce Ediger notes in his answer) the data's not much use unless you know the format.
  • peterh
    peterh about 6 years
    No, only 1 inode can point to the same file. Although the same inode can exist simultanously in multiple directories (or on multiple names). An easy check: ls -l >test.txt;ln -vf test.txt test2.txt;ls -li test.txt test2.txt. So you will see, that hard links have the same inode number.
  • peterh
    peterh about 6 years
    @Gilles I think it would be very logical if a directory copied by dd would be essentially an equivalent of cp --link dir1/* dir2, although I am not sure about its usability.
  • alamin
    alamin about 6 years
    @peterh File descriptors are only unique to a process. can you explain?
  • peterh
    peterh about 6 years
    @Md.AlaminMahamud It is not true, if a process fork()s, its child process will have (except some special circumstance, namely an O_CLOEXEC flag) exactly the same filedescriptor entities as the original process had. Another example: apache child processes are listen()ing on the same socket file descriptor. But this answer is not about the file descriptors, which are a kernel-internal data structure and exist only in the kernel memory. This (false) answer is about the directory entries and the inodes, these are on-disk entities (i.e. they are physical bytes on the hard drive).
  • peterh
    peterh about 6 years
    @Md.AlaminMahamud Well, now I am not very sure, for example if a fork() happen and then the child process seek()s or close()s, it won't affect the file descriptor of the parent. So I am thinking now, that the file descriptors are only partially process-private structures. But this question is not about them, this question is about the dirents/inodes and I am commentchatting you on an entirely false answer to this question.