How to read a CSV file from Hdfs?

18,699

Solution 1

The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
        System.out.println(inputStream.readChar());         
    }

FSDataInputStream has several read methods. Choose the one which suits your needs.

If it is MR, it's even easier :

        public static class YourMapper extends
                    Mapper<LongWritable, Text, Your_Wish, Your_Wish> {

                public void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {

                    //Framework does the reading for you...
                    String line = value.toString();      //line contains one line of your csv file.
                    //do your processing here
                    ....................
                    ....................
                    context.write(Your_Wish, Your_Wish);
                    }
                }
            }

Solution 2

If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.

Other option is to develop (or find developed) CSV input format for reading data from file.

There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions

If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs

HTH

Share:
18,699
user2454360
Author by

user2454360

Updated on June 13, 2022

Comments

  • user2454360
    user2454360 almost 2 years

    I have my Data in a CSV file. I want to read the CSV file which is in HDFS.

    Can anyone help me with the code??

    I'm new to hadoop. Thanks in Advance.

  • user2454360
    user2454360 almost 11 years
    Can you share some snippet of code for using TextInputFormat, i'm not able to get the right code in google :(
  • dino.keco
    dino.keco almost 11 years
  • Divyang Shah
    Divyang Shah almost 2 years
    this code is used for reading text file. but, what is specially mentioned to read a csv file?