What is byte-offset value in hadoop or in java?

14,581

Solution 1

byte offset is the number of character that exists counting from the beginning of a line.

for example, this line

what is byte offset?

will have a byte offset of 19. This is used as key value in hadoop

Solution 2

Basically an offset is an integer which is used to find the distance ( absolute address) with respect to the base address.

Assume a Text file with the following data

Computer-science World
Quantum Computing

now the offset for the first line is 0 and the input to the hadoop job will be <0,Computer Science World> for the second line the offset will be <23,Quantum Computing>

whenever we pass the text file to hadoop job. It internally calculates the byte offset.

Solution 3

The byte offset is the count of bytes starting at zero. One character or space is usually one byte when talking about Hadoop. But check out this question if you want to know more: How many bits in a character?

Share:
14,581
user3493414
Author by

user3493414

Updated on June 04, 2022

Comments

  • user3493414
    user3493414 about 2 years

    I am a bit confused with the term, a byte offset value, which is treated as map key in Hadoop Map reduce program.

    First, what is the byte offset value?

    Second, how is it generated, and how does one view this byte-offset value?