hbase scan timerange return old version

11,705

Solution 1

Dape,

When you set the max versions to 1 and have more than one entry for a cell, Hbase tombstones the older cells and gets and scans cannot see them unless ofcourse you specify a particular timestamp range which qualifies only one cell. The tombstoned cells are only deleted after a Major_compact is run on the table, which is when the older cells would stop popping up.

To always get the latest cells from a scan all you need to do is use the method below -

    Result.getColumnLatest(family, qualifier)

Solution 2

java.io.IOException: IPC server unable to read call parameters: Error in readFields

you need to copy the jars to all region servers, and edit HBASE_CLASSPATH in hbase-env.sh on region servers accordingly

you can specify timerange and MaxVersions on Scanner to get old versions within the time range

scan.setMaxVersions(Integer.MAX_VALUE);
scan.setTimeRange(startVersion, endVersion);

Solution 3

I think this is exactly the same problem I ran into here: HBase get returns old values even with max versions = 1

It turns out to be a bug in hbase. See: https://issues.apache.org/jira/browse/HBASE-10102

Share:
11,705
dape
Author by

dape

Updated on June 04, 2022

Comments

  • dape
    dape almost 2 years

    I have one question about hbase scan by using timerange. I create a 'test' table,it has one family 'cf' and one version , after I put 4 rows data in that table, and scan that table by using timerange, however, I get a old version row within the timerange.

    for example:

     create 'test',{NAME=>'cf',VERSIONS=>1}
     put 'test','row1','cf:u','value1' 
     put 'test','row2','cf:u','value2'
     put 'test','row3','cf:u','value3'
     put 'test','row3','cf:u','value4'
    

    and then I scan this table,the following is the output:

     hbase(main):008:0> scan 'test'
     ROW                                      COLUMN+CELL                                                                                                          
     row1                                    column=cf:u, timestamp=1340259691771, value=value1                                                                   
     row2                                    column=cf:u, timestamp=1340259696975, value=value2                                                                   
     row3                                    column=cf:u, timestamp=1340259704569, value=value4   
    

    it it right,row3 have the newest version.

    however,If I use scan it with timerange I get this:

      hbase(main):010:0> scan 'test',{TIMERANGE=>[1340259691771,1340259704569]}
      ROW                                      COLUMN+CELL                                                                                                          
      row1                                    column=cf:u, timestamp=1340259691771, value=value1                                                                   
      row2                                    column=cf:u, timestamp=1340259696975, value=value2                                                                   
      row3                                    column=cf:u, timestamp=1340259701085, value=value3     
    

    it return row3 old version, but this table I set version equal 1

    if I increase maxtimestamp ,I get :

      hbase(main):011:0> scan 'test',{TIMERANGE=>[1340259691771,1340259704570]}
      ROW                                      COLUMN+CELL                                                                                                          
      row1                                    column=cf:u, timestamp=1340259691771, value=value1                                                                   
      row2                                    column=cf:u, timestamp=1340259696975, value=value2                                                                   
      row3                                    column=cf:u, timestamp=1340259704569, value=value4                                                                   
    

    3 row(s) in 0.0330 seconds

    It is right,I can understand it.

    What I want is scan a table within a timerange,it return only newest version, I know there is a TimestampsFilter, however that filter only support specific timestamp ,not time range.

    Is there any way to scan a table within a timerange and only return newest verion?

    I try to write my own timerangefilter,the following is my code.

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.util.ArrayList;
    
    import org.apache.hadoop.hbase.KeyValue;
    import org.apache.hadoop.hbase.filter.Filter;
    import org.apache.hadoop.hbase.filter.FilterBase;
    import org.apache.hadoop.hbase.filter.ParseFilter;
    
    import com.google.common.base.Preconditions;  
    
    public class TimeRangeFilter extends FilterBase {
    
    private long minTimeStamp = Long.MIN_VALUE;
    private long maxTimeStamp = Long.MAX_VALUE;
    
    public TimeRangeFilter(long minTimeStamp, long maxTimeStamp) {
        Preconditions.checkArgument(maxTimeStamp >= minTimeStamp, "max timestamp %s must be big than min timestamp %s", maxTimeStamp, minTimeStamp);
        this.maxTimeStamp = maxTimeStamp;
        this.minTimeStamp = minTimeStamp;
    }
    
    @Override
    public ReturnCode filterKeyValue(KeyValue v) {
        if (v.getTimestamp() >= minTimeStamp && v.getTimestamp() <= maxTimeStamp) {
            return ReturnCode.INCLUDE;
        } else if (v.getTimestamp() < minTimeStamp) {
            // The remaining versions of this column are guaranteed
            // to be lesser than all of the other values.
            return ReturnCode.NEXT_COL;
        }
        return ReturnCode.SKIP;
    }
    
    public static Filter createFilterFromArguments(ArrayList<byte[]> filterArguments) {
        long minTime, maxTime;
        if (filterArguments.size() < 2)
            return null;
        minTime = ParseFilter.convertByteArrayToLong(filterArguments.get(0));
        maxTime = ParseFilter.convertByteArrayToLong(filterArguments.get(1));
        return new TimeRangeFilter(minTime, maxTime);
    }
    
    @Override
    public void write(DataOutput out) throws IOException {
        // TODO Auto-generated method stub
        out.writeLong(minTimeStamp);
        out.writeLong(maxTimeStamp);
    }
    
    @Override
    public void readFields(DataInput in) throws IOException {
        // TODO Auto-generated method stub
        this.minTimeStamp = in.readLong();
        this.maxTimeStamp = in.readLong();
    }
    

    }

    I add this jar into hbase HBASE_CLASSPATH in hbase-env.sh, however,I get the following error:

    org.apache.hadoop.hbase.client.ScannerCallable@a9255c, java.io.IOException: IPC server unable to read call parameters: Error in readFields