hbase scan timerange return old version
Solution 1
Dape,
When you set the max versions to 1 and have more than one entry for a cell, Hbase tombstones the older cells and gets and scans cannot see them unless ofcourse you specify a particular timestamp range which qualifies only one cell. The tombstoned cells are only deleted after a Major_compact is run on the table, which is when the older cells would stop popping up.
To always get the latest cells from a scan all you need to do is use the method below -
Result.getColumnLatest(family, qualifier)
Solution 2
java.io.IOException: IPC server unable to read call parameters: Error in readFields
you need to copy the jars to all region servers, and edit HBASE_CLASSPATH in hbase-env.sh on region servers accordingly
you can specify timerange and MaxVersions on Scanner to get old versions within the time range
scan.setMaxVersions(Integer.MAX_VALUE);
scan.setTimeRange(startVersion, endVersion);
Solution 3
I think this is exactly the same problem I ran into here: HBase get returns old values even with max versions = 1
It turns out to be a bug in hbase. See: https://issues.apache.org/jira/browse/HBASE-10102
dape
Updated on June 04, 2022Comments
-
dape almost 2 years
I have one question about hbase scan by using timerange. I create a 'test' table,it has one family 'cf' and one version , after I put 4 rows data in that table, and scan that table by using timerange, however, I get a old version row within the timerange.
for example:
create 'test',{NAME=>'cf',VERSIONS=>1} put 'test','row1','cf:u','value1' put 'test','row2','cf:u','value2' put 'test','row3','cf:u','value3' put 'test','row3','cf:u','value4'
and then I scan this table,the following is the output:
hbase(main):008:0> scan 'test' ROW COLUMN+CELL row1 column=cf:u, timestamp=1340259691771, value=value1 row2 column=cf:u, timestamp=1340259696975, value=value2 row3 column=cf:u, timestamp=1340259704569, value=value4
it it right,row3 have the newest version.
however,If I use scan it with timerange I get this:
hbase(main):010:0> scan 'test',{TIMERANGE=>[1340259691771,1340259704569]} ROW COLUMN+CELL row1 column=cf:u, timestamp=1340259691771, value=value1 row2 column=cf:u, timestamp=1340259696975, value=value2 row3 column=cf:u, timestamp=1340259701085, value=value3
it return row3 old version, but this table I set version equal 1
if I increase maxtimestamp ,I get :
hbase(main):011:0> scan 'test',{TIMERANGE=>[1340259691771,1340259704570]} ROW COLUMN+CELL row1 column=cf:u, timestamp=1340259691771, value=value1 row2 column=cf:u, timestamp=1340259696975, value=value2 row3 column=cf:u, timestamp=1340259704569, value=value4
3 row(s) in 0.0330 seconds
It is right,I can understand it.
What I want is scan a table within a timerange,it return only newest version, I know there is a TimestampsFilter, however that filter only support specific timestamp ,not time range.
Is there any way to scan a table within a timerange and only return newest verion?
I try to write my own timerangefilter,the following is my code.
import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import java.util.ArrayList; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.filter.Filter; import org.apache.hadoop.hbase.filter.FilterBase; import org.apache.hadoop.hbase.filter.ParseFilter; import com.google.common.base.Preconditions; public class TimeRangeFilter extends FilterBase { private long minTimeStamp = Long.MIN_VALUE; private long maxTimeStamp = Long.MAX_VALUE; public TimeRangeFilter(long minTimeStamp, long maxTimeStamp) { Preconditions.checkArgument(maxTimeStamp >= minTimeStamp, "max timestamp %s must be big than min timestamp %s", maxTimeStamp, minTimeStamp); this.maxTimeStamp = maxTimeStamp; this.minTimeStamp = minTimeStamp; } @Override public ReturnCode filterKeyValue(KeyValue v) { if (v.getTimestamp() >= minTimeStamp && v.getTimestamp() <= maxTimeStamp) { return ReturnCode.INCLUDE; } else if (v.getTimestamp() < minTimeStamp) { // The remaining versions of this column are guaranteed // to be lesser than all of the other values. return ReturnCode.NEXT_COL; } return ReturnCode.SKIP; } public static Filter createFilterFromArguments(ArrayList<byte[]> filterArguments) { long minTime, maxTime; if (filterArguments.size() < 2) return null; minTime = ParseFilter.convertByteArrayToLong(filterArguments.get(0)); maxTime = ParseFilter.convertByteArrayToLong(filterArguments.get(1)); return new TimeRangeFilter(minTime, maxTime); } @Override public void write(DataOutput out) throws IOException { // TODO Auto-generated method stub out.writeLong(minTimeStamp); out.writeLong(maxTimeStamp); } @Override public void readFields(DataInput in) throws IOException { // TODO Auto-generated method stub this.minTimeStamp = in.readLong(); this.maxTimeStamp = in.readLong(); }
}
I add this jar into hbase HBASE_CLASSPATH in hbase-env.sh, however,I get the following error:
org.apache.hadoop.hbase.client.ScannerCallable@a9255c, java.io.IOException: IPC server unable to read call parameters: Error in readFields