Load data into Hbase table using HBASE MAP REDUCE API

12,694

Solution 1

With regard to your questions:

  • The Mapper receives splits of data and returns a pair key, set<values>
  • The Reducer receives the output of from the Mapper and generates a pair <key, value>

Generally, will be your Reducer task which will write results (to the filesystem or to HBase), but the Mapper can do that too. There are MapReduce jobs which don't require a Reducer. With regard to reading from HBase, it's the Mapper class that has the configuration from which table to read from. But there's nothing related a Mapper is a reader and Reducer a writer. This article "HBase MapReduce Examples" provides good examples about how to read from and write into HBase using MapReduce.

In any case, if what you need is to bulk import some .csv files into HBase, you don't really need to do it with a MapReduce job. You can do it directly with the HBase API. In pseudocode:

table = hbase.createTable(tablename, fields); 
foreach (File file: dir) {
   content = readfile(file);    
   hbase.insert(table, content); 
}

I wrote an importer of .mbox files into HBase. Take a look at the code, it may give you some ideas.

Once your data is imported into HBase, then you do need to code a MapReduce job to operate with that data.

Solution 2

Using HFileOutputFormat with CompleteBulkLoad is best and fastest way to load data in HBase. You will find sample code here

Share:
12,694
Navyah
Author by

Navyah

Updated on June 04, 2022

Comments

  • Navyah
    Navyah almost 2 years

    I am very new for Hbase and Map Reduce API.
    I am very confused with Map Reduce concepts. I need to Load text file into Hbase table using MAPReduce API. I googled some Examples but in that I can find MAPPER () not reducer method. I am confused with when to use mapper and when to use Reducer ().

    I am thinking in the way like :

    1. TO write data to a Hbase we use mapper
    2. TO read data from HBASE we use mapper and reducer(). please can any one clear me with detail explanation.
    3. I am trying to load data from text file into HBASE table. I googled and tried some code but i dont know, how to load the text file and read in HBASE mapreduce API.

    I really thank full for certain help

  • Navyah
    Navyah over 11 years
    I need to Read raw Text file/csv file from system into map-reduce job and read the data which is available in the text file and store the retrieved data it in Hbase table. the above links are not for right task
  • QuinnG
    QuinnG over 11 years
    @user178900: Added a link that might address the additional need.
  • Navyah
    Navyah over 11 years
    I need to read Csv File /Text file from a local system into mapreduce Job and store the data in htable. I cant find any methods to read a file from local system, can u please provide some samples, i am very new for Hadoop
  • QuinnG
    QuinnG over 11 years
    @user178900 You can't read from the local file system. The data needs to be on the HDFS. You either need to put it there first and have that as the input or include it as a 'file' as my added link shows. Hadoop is a distributed system, it can't be sure that a file existing on machineA exists on machineB, so it does't read local files. (There's probably some hack to do it, but it's not the intent of hadoop) My suggestion is to put the file on the HDFS first, then use the concepts in the first 2 links.
  • Navyah
    Navyah over 11 years
    My hadoop is on Remote Machine and i am creating an application in eclipse reading data from my local system like(d:workplace/input.csv). Cant i use the above link in Mapreduce and load data in Htable