Are files saved on disk sequentially?

7,006

Solution 1

Can a file be saved not sequentially on disk? I mean, part of the file is located under physical address X and the other part under physical address Y which isn't close to X + offset).

Yes; this is known as file fragmentation and is not uncommon, especially with larger files. Most file systems allocate space as it's needed, more or less sequentially, but they can't guess future behaviour — so if you write 200MiB to a file, then add a further 100MiB, there's a non-zero chance that both sets of data will be stored in different areas of the disk (basically, any other write needing more space on disk, occurring after the first write and before the second, could come in between the two). If a filesystem is close to full, the situation will usually be worse: there may not be a contiguous area of free space large enough to hold a new file, so it will have to be fragmented.

Can I somehow control the file sequentiallity? I want to allocate big file of 10GB. I want it to be sequential in disk and not divided between different offsets.

You can tell the filesystem about your file's target size when it's created; this will help the filesystem store it optimally. Many modern filesystems use a technique known as delayed allocation, where the on-disk layout of a new file is calculated as late as possible, to maximise the information available when the calculation is performed. You can help this process by using the posix_fallocate(3) function to tell the filesystem how much disk space should be allocated in total. Modern filesystems will try to perform this allocation sequentially.

Does it act differently between the different types?

Different filesystems behave differently, yes. Log-based filesystems such as NILFS2 don't allocate storage in the same way as extent-based filesystems such as Ext4, and that's just one example of variation.

Solution 2

The command filefrag will tell you how your file is physically stored on your device:

# filefrag -v /var/log/messages.1 
Filesystem type is: ef53
File size of /var/log/messages.1 is 41733 (11 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  2130567               1 
   1       1 15907576  2130568      1 
   2       2 15910400 15907577      1 
   3       3 15902720 15910401      7 
   4      10  2838546 15902727      1 eof
/var/log/messages.1: 5 extents found

If you write your file in one pass, my guess is that your file won't be fragmented.

The man page of fallocate(1) is pretty clear :

fallocate is used to preallocate blocks to a file. For filesystems which support the fallocate system call, this is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeros.

As of the Linux Kernel v2.6.31, the fallocate system call is supported by the btrfs, ext4, ocfs2, and xfs filesystems.

Is it sequential? The system will first try to allocate the blocks sequentially. If it can't, it will not warn you.

Solution 3

You mention sparse files, and none of the other answers have mentioned them.

Most files are not sparse. The most common way to create a file is to write it all in one go, from the start to the end. No holes there.

However, you are allowed to say "move to position 1,000,000,000,000 and write a byte there." This will create a file that looks like it is an etabyte big, but actually only uses (probably) 4k on disk. This is a sparse file.

You can do this many times for the same file, leaving small amounts of data scattered across the vast emptiness.

While this can be useful, there are two downsides.

The first is that the file will be fragmented, which is what you worried about.

The second is that not all programs handle these files well. E.g. some backup software will try to backup the emptiness and thereby create a backup which is much larger than necessary, possibly too big for the backup medium.

Solution 4

Can I somehow control the file sequentiality? I want to allocate a file of 10GB. I want it to be sequential on disk and not divided between different offsets.

There are at least a couple of ways to achieve this.

  1. Use a filesystem with a lot of spare space and preallocate the space (e.g. use an application specific end-of-data marker and append random data until the filesize reaches 10GB). This isn't guaranteed to result in unfragmented data.

  2. Use a raw (uncooked) filesystem instead of ext4 etc. DBMSs sometimes do this for performance reasons. The tradeoff is you have to do your own caching/journalling/recovery etc if needed.

Instances where you gain much from doing this are relatively rare - I would first look elsewhere to optimise performance.


See also

Is it true that database management systems typically bypass file systems?

Share:
7,006

Related videos on Youtube

hudac
Author by

hudac

Updated on September 18, 2022

Comments

  • hudac
    hudac almost 2 years

    As I understood, "sparse file" means that the file may have 'gaps' so the actual used data may be smaller than the logical file size.

    How do Linux file systems save files on disk? I'm mainly interested in ext4. But:

    1. Can a file be saved not sequentially on disk? By that, I mean that part of the file is located at physical address X and the next part at physical address Y which isn't close to X + offset).
    2. Can I somehow control the file sequentiality?
      I want to allocate a file of 10GB. I want it to be sequential on disk and not divided between different offsets.
    3. Does it act differently between the different types?
    • Admin
      Admin over 7 years
    • Admin
      Admin over 7 years
      Perhaps, if I understand your intention correctly, you would be more interested in lower-level API, where you work with storage devices w/o having to go through the file-system layer. Your entry-point then could be the dmsetup program, an interface to device mapper. This may be a good choice if you are planning a database-like storage.
    • Admin
      Admin over 7 years
      ext4 extents can be contiguous. I would also expect anything written raw to be contiguous, allocating a 10G partition and just writing to that. Essentially you need a 10GB hole. This was a common issue on z/OS when hundreds of contiguous cylinders were required.
    • Admin
      Admin over 7 years
      What do you expect to happen if there isn't 10G of contiguous space? Also, have you considered making a separate partition for this? And what if the ext4 partition is part of a RAID stripe set?
    • Admin
      Admin over 7 years
      The main reason for my question is performance: I want the file to be contiguous so there won't be an impact, in contrast to a fragmented file - where the needle will jump from one fragment to another
    • Admin
      Admin over 7 years
      @hudac one thing to keep in mind is that contiguous is not all that useful in practice. The easy one is flash where fragmentation is not a big deal, but on a spinning platter you still might not benefit from contiguous data. On a spinning platter you need to think about your access patterns and where the data is. If you need the sector that just passed under the head you have to wait for it to come fully around again. To get the best results you want to stagger the data so that it is "close" when it needs to be read. Increasing cache size is easier ;-)
    • Admin
      Admin over 7 years
      Note that on SSDs and flash disks fragmentation doesn't much matter as the seek time is zero. On physical spinning disks, seek time is measured in ms - maybe 10-20ms. At 10ms, a drive can do at most 100 seeks per second (avg) - so this will severely limit performance when files are badly fragmented as the drive will be spending all of its time seeking the head back and forth. (In addition to seek time, it has to wait for the block you want to read to pass under the read head - affected by the rate of spin - RPMs).
  • hudac
    hudac over 7 years
    Will using fallocate(3) ensure file sequentiallity? or will just hint the filesystem? I can't fully understand it from the man pages.
  • Stephen Kitt
    Stephen Kitt over 7 years
    It can't ensure sequential allocation, it's just a hint. But you should definitely use it if you're writing 10GiB files!
  • zwol
    zwol over 7 years
    Essentially all file systems more sophisticated than FAT -- this goes all the way back to the original Berkeley UFS -- will intentionally break up large files and spread them over multiple "allocation groups"; this helps them minimize the overall fragmentation of the disk. There may be a way to adjust how this works, but there's good odds you have to recreate the filesystem from scratch in order to do it, and there probably isn't a way to turn it completely off.
  • Muzer
    Muzer over 7 years
    @hudac It's impossible to guarantee sequentiality in all cases (see the case with a drive that is close to being full), and to be honest with the rise of SSDs it matters less than it used to (for those who can afford them at least).
  • hudac
    hudac over 7 years
    What is type 'ef53'. I saw it also on my files. But my FS type is ext4.
  • Vouze
    Vouze over 7 years
    EF53 is the "SUPER_MAGIC" number of ext2, ext3 and ext4. Look in "include/uapi/linux/magic.h" in the kernel sources for all magic numbers of every file-system.
  • jamesqf
    jamesqf over 7 years
    Also note that there are situations, like RAID systems, where having contiguous files is less efficient, if it's even possible. I think that's really the purpose of a disk/storage subsystem controller: to offload all the work of storing files as optimally as can reasonably be expected.
  • Toby Speight
    Toby Speight over 7 years
    On Debian, filefrag is hidden in /usr/sbin. But it seems to work for ordinary users (on ext4, at least). It may be instructive to strace its operation to see how to measure fragmentation for yourself, if the lack of warning is a hindrance to you.
  • MSalters
    MSalters over 7 years
    @zwol: It generally doesn't require recreating the whole file system when you tweak file placement parameters such as the largest allowed extent. And no, intentionally fragmenting files doesn't help overall fragmentation at all. After you've written a 64 MB fragment, the best place for the next 64 MB fragment is directly behind the first fragment. The problem is the unintentional fragmentation when you write to two files; where should you place the second file? Either you fragment free space, or you end up fragmenting the two files, but something will fragment.
  • Toby Speight
    Toby Speight over 7 years
    "Run the defragmenter"? Is there such a program? The only thing found when I searched with aptitude search ~ddefrag were ddrescueview and the nids TCP segment reassembly library. Your answer isn't very helpful if you don't say what the program is called, or what arguments need to be passed.
  • jpaugh
    jpaugh over 7 years
    @MSalters Makes sense. IIRC from my CompSci days, "worst fit" (that is, taking the biggest chunk of free space available at the time) produces the least fragmentation, on average. "Best fit" (taking the smallest area that is large enough) of course performs much worse when you have files growing.
  • jpaugh
    jpaugh over 7 years
    @zwol I bet that is actually to increase performance when reading from different areas of the large files, rather than to mitigate fragmentation.
  • MSalters
    MSalters over 7 years
    @jpaugh: "Worst fit" and "best fit" are typical CompSci algorithms assuming you know how well the item fits. But an OS often can't predict how big a file will become, when a program starts writing. Still, in that case "worst fit" = "largest hole" has the best chance of fitting the entire file.
  • Stephen Kitt
    Stephen Kitt over 7 years
    @MSalters and that's the whole point of fallocate: to tell the OS how big the file is, before it allocates it. Of course there are many situations where even the writing program doesn't know how big a file is going to be (e.g. any document you update, log files...), but there are many cases where it does (VM images, downloaded files, chunked video...) and where reducing fragmentation is useful.
  • Barmar
    Barmar over 7 years
    But even a non-sparse file will often not be contiguous on disk.
  • ravery
    ravery about 6 years
    @TobySpeight - yes there is a defragmenter; e4defrag.