How can hadoop mapreduce get data input from CSV file?

11,312

By default Hadoop uses a Text Input reader that feeds the mapper line by line from the input file. The key in the mapper is the number of lines read. Be careful with CSV files though, as single columns/fields can contain a line break. You might want to look for a CSV input reader like this one:

https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java

But, you have to split your line in your code.

Share:
11,312
Kenny Bi
Author by

Kenny Bi

Updated on June 04, 2022

Comments

  • Kenny Bi
    Kenny Bi almost 2 years

    I want to implement hadoop mapreduce, and I use the csv file for it's input. So, I want to ask, is there any method that hadoop provide for use to get the value of csv file, or we just do it with Java Split String function?

    Thanks all.....