How to convert a Hadoop Path object into a Java File object

11,994

Solution 1

Not that I'm aware of.

To my understanding, a Path in Hadoop represents an identifier for a node in their distributed filesystem. This is a different abstraction from a java.io.File, which represents a node on the local filesystem. It's unlikely that a Path could even have a File representation that would behave equivalently, because the underlying models are fundamentally different.

Hence the lack of translation. I presume by your assertion that File objects are "[more] useful", you want an object of this class in order to use existing library methods? For the reasons above, this isn't going to work very well. If it's your own library, you could rewrite it to work cleanly with Hadoop Paths and then convert any Files into Path objects (this direction works as Paths are a strict superset of Files). If it's a third party library then you're out of luck; the authors of that method didn't take into account the effects of a distributed filesystem and only wrote that method to work on plain old local files.

Solution 2

I recently had this same question, and there really is a way to get a file from a path, but it requires downloading the file temporarily. Obviously, this won't be suitable for many tasks, but if time and space aren't essential for you, and you just need something to work using files from Hadoop, do something like the following:

import java.io.File;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public final class PathToFileConverter {
    public static File makeFileFromPath(Path some_path, Configuration conf) throws IOException {
        FileSystem fs = FileSystem.get(some_path.toUri(), conf);
        File temp_data_file = File.createTempFile(some_path.getName(), "");
        temp_data_file.deleteOnExit();
        fs.copyToLocalFile(some_path, new Path(temp_data_file.getAbsolutePath()));
        return temp_data_file;
    }
}

Solution 3

If you get a LocalFileSystem

final LocalFileSystem localFileSystem = FileSystem.getLocal(configuration);

You can pass your hadoop Path object to localFileSystem.pathToFile

final File localFile = localFileSystem.pathToFile(<your hadoop Path>);
Share:
11,994
akintayo
Author by

akintayo

Updated on July 19, 2022

Comments

  • akintayo
    akintayo almost 2 years

    Is there a way to change a valid and existing Hadoop Path object into a useful Java File object. Is there a nice way of doing this or do I need to bludgeon to code into submission? The more obvious approaches don't work, and it seems like it would be a common bit of code

    void func(Path p) {
      if (p.isAbsolute()) {
         File f = new File(p.toURI());
      }
    }
    

    This doesn't work because Path::toURI() returns the "hdfs" identifier and Java's File(URI uri) constructor only recognizes the "file" identifier.

    Is there a way to get Path and File to work together?

    **

    Ok, how about a specific limited example.

    Path[] paths = DistributedCache.getLocalCacheFiles(job);
    

    DistributedCache is supposed to provide a localized copy of a file, but it returns a Path. I assume that DistributedCache make a local copy of the file, where they are on the same disk. Given this limited example, where hdfs is hopefully not in the equation, is there a way for me to reliably convert a Path into a File?

    **

  • mariop
    mariop about 9 years
    This answer is wrong: a Hadoop Path is not an identifier for a node in Hadoop filesystem but a file or directory in any filesystem. Hadoop FileSystem is generic, meaning that can support different FileSystems, not only HDFS. That's clearly written in the documentation. The reason why there is no conversion from a Hadoop Path to a Java File is not because they represent two different abstractions.
  • m-bhole
    m-bhole almost 6 years
    java.lang.IllegalArgumentException: Wrong FS: hdfs://ip-XXX-XX-X-XXX:8020/HDFS_PATH, expected: file:///