How expensive is File.exists in Java

12,811

Solution 1

How this operation if performed the first time is entirely dependant on the filesystem. This is done by the OS and Java doesn't play any part.

In terms of performance, a read to a disk is required in all cases. This typically takes 8-12 ms. @Sven points out some storage could slower, but this relatively rare in cases where performance is important. You may have an additional delay if this is a network file system (usually relatively small but it depends on your network latency).

Everything else the OS and Java does is very short by comparison.

However, if you check the file exists repeatedly, a Disk access may not be required as the information can cached, in this case the time the OS takes and resources. One of the largest of these the objects File.exists() creates (you wouldn't think it would) however it encodes the file's name on every call creating a lot of objects. If you put File.exists() in a tight loop it can create 400MB of garbage per second. :(

Journaling filesystems work differently by keeping track of all the changes you make to a file system, however they don't change how you read the filesystem.

Solution 2

Measure the necessary time and see yourself. As you say it is absolutely file system dependent.

        long t1 = System.currentTimeMillis();
        ...Your File.exists call
        long t2 = System.currentTimeMillis();
        System.out.println("time: " + (t2 - t1) + " ms");

You will see that it will always give you different results, since it depends also on the way your OS caches data, on its load etc.

Solution 3

Most of the file-related operations are not performed in Java; native code exists to perform these activities. In reality, most of the work done depends on the nature of the FileSystem object (that is backing the File object) and the underlying implementation of the native IO operations in the OS.

I'll present the case of the implementation in OpenJDK 6, for clarity. The File.exists() implementation defers the actual checks to the FileSystem class:

public boolean exists() {
    ... calls to SecurityManager have been omitted for brevity ...
    return ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);
}

The FileSystem class is abstract, and an implementation exists for all supported filesystems:

package java.io;


/**
 * Package-private abstract class for the local filesystem abstraction.
 */

abstract class FileSystem

Notice the package private nature. A Java Runtime Environment, will provide concrete classes that extend the FileSystem class. In the OpenJDK implementation, there are:

  • java.io.WinNTFileSystem, for NTFS
  • java.io.Win32FileSystem, for FAT32
  • java.io.UnixFileSystem, for *nix filesystems (this is a class with a very broad responsibility).

All of the above classes delegate to native code, for the getBooleanAttributes method. This implies that performance is not constrained by the managed (Java) code in this case; the implementation of the file system, and the nature of the native calls being made have a greater bearing on performance.

Update #2

Based on the updated question -

I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs

Well, that still doesn't matter. Different operating systems will implement support for different file systems in different ways. For example, NTFS support in Windows will be different from the one in *nix, because the operating system will also have to do it's share of bookkeeping, in addition to communicating with devices via their drivers; not all the work is done in the device.

In Windows, you will almost always find the concept of a file system filter drivers that manages the task of communicating with other file system filter drivers or the file system. This is necessary to support various operations; one example would be the use of filter drivers for anti-virus engines and other software (on-the-fly encryption and compression products) intercepting IO calls.

In *nix, you will have the stat(), system call that will perform the necessary activity of reading the inode information for the file descriptor.

Solution 4

It's super fast on any modern machine, my tests show 0.0028 millis (2.8 microseconds) on my 2013 Mac w/SSD

1,000 files created in 307 millis, 0.0307 millis per file

1,000 .exists() done in 28 millis, 0.0028 millis per file

Here's a test in Groovy (Java)

def index() {
    File fileWrite

    long start = System.currentTimeMillis()

    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        fileWrite.write('Some nice text')
    }
    long diff = System.currentTimeMillis() - start
    println "1,000 files created in $diff millis, ${diff/10000.0} millis per file"



    start = System.currentTimeMillis()
    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        if ( ! fileWrite.exists() )
            throw new Exception("where's the file")
    }
    diff = System.currentTimeMillis() - start
    println "1,000 .exists()   done in  $diff millis, ${diff/10000.0} millis per file"

}
Share:
12,811
Franz Kafka
Author by

Franz Kafka

Czech beer is awesome

Updated on June 06, 2022

Comments

  • Franz Kafka
    Franz Kafka almost 2 years

    I am wondering how File.exists() works. I'm not very aware of how filesystems work, so I should maybe start reading there first.

    But for a quick pre information:

    Is a call to File.exists() a single action for the filesystem, if that path and filename are registered in some journal? Or does the OS get the content of the directory and then scan through it for matches?

    I presume this will be filesystem dependent, but maybe all filesystems use the quick approach?

    I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs :-)