Hibernate - distinct results with pagination
Solution 1
Consider using DistinctRootEntity result transformer like this
session.createCriteria(A.class)
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY).list();
UPDATE
The samples of queries for one-to-many associations.
public Collection<Long> getIDsOfAs(int pageNumber, int pageSize) {
Session session = getCurrentSession();
Criteria criteria = session.createCriteria(A.class)
.setProjection(Projections.id())
.addOrder(Order.asc("id"));
if(pageNumber >= 0 && pageSize > 0) {
criteria.setMaxResults(pageSize);
criteria.setFirstResult(pageNumber * pageSize);
}
@SuppressWarnings("unchecked")
Collection<Long> ids = criteria.list();
return ids;
}
public Collection<A> getAs(int pageNumber, int pageSize) {
Collection<A> as = Collections.emptyList();
Collection<Long> ids = getIDsOfAs(pageNumber, pageSize);
if(!ids.isEmpty()) {
Session session = getCurrentSession();
Criteria criteria = session.createCriteria(A.class)
.add(Restrictions.in("id", ids))
.addOrder(Order.asc("id"))
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
@SuppressWarnings("unchecked")
as = criteria.list();
}
return as;
}
Solution 2
You mention the reason you're seeing this problem is because Set<B>
is fetched eagerly. If you're paginating, chances are you don't need the B
's for each A
, so it might be better to fetch them lazily.
However, this same problem occurs when you join the B
's into the query to make a selection.
In some cases, you will not only want to paginate, but also sort on other fields than the ID. I think the way to do this is to formulate the query like this:
Criteria filter = session.createCriteria(A.class) .add(... any criteria you want to filter on, including aliases etc ...); filter.setProjection(Projections.id()); Criteria paginate = session.createCriteria(A.class) .add(Subqueries.in("id", filter)) .addOrder(Order.desc("foo")) .setMaxResults(max) .setFirstResult(first); return paginate.list();
(pseudocode, didn't check if the syntax is exactly right but you get the idea)
Ashish
Updated on June 20, 2022Comments
-
Ashish almost 2 years
This seems to be a well known problem for years as can be read here: http://blog.xebia.com/2008/12/11/sorting-and-pagination-with-hibernate-criteria-how-it-can-go-wrong-with-joins/
And even finds reference in hibernate faqs:
This has also been discussed previously on SO
How to get distinct results in hibernate with joins and row-based limiting (paging)?
The problem is that even after going through all these resources, I have not been able to resolve my issue, which seems to be a little different from this standard problem, although I am not sure.
The standard solution proposed here involves creating two queries, first one for getting distinct IDs and then using those in a higher level query to get the desired pagination. The hibernate classes in my case are something like
A - aId - Set<B> B - bId
It appears to me that the subquery seems to be working fine for me and is being able to get the distinct aIds but the outer query which is supposed to do the pagination is again fetching the duplicates and thus the distinct in subquery is having no effect.
Assuming I have one A object which has a set of four B objects, My analysis is that because of introduction of set, while fetching data for
session.createCriteria(A.class).list();
hibernate is populating four references in the list pointing to just one object. Because of this the standard solution is failing for me.
Could someone please help in coming up with a solution for this case?
Edit: I have decided to go for doing pagination by ourselves from the distinct resultset. The other equally bad way could have been to lazy load the B objects but that would have required separate queries for all the A objects to fetch corresponding B objects
-
Ashish about 12 yearsI have used that as well, in the subquery. My subquery is working fine and is returning unique IDs. The problem is that the outer query is fetching duplicates. For example if subquery returns A.ids = 1,2,3, then the main query (which is used for pagination), fetches duplicate A.ids like 1,1,1,2,2,3,3,3 and therefore the complete query of the form Select A.* from A where A.id in (subquery) limit 10; fails to get me unique A records
-
szhem about 12 yearsYou final query should look like this: select a.* from a where a.id in (:ids), and query to get identifiers of A should look like this: select a.id from A limit 10. So apply Limit only to identifiers not to the final query.
-
Ashish about 12 yearsmy final query is of the form select a.* from a left outer join b on a.id=b.aid and this is required to have the complete A object.
-
Ashish about 12 yearsgiven a structure of the form given above in terms of A and B, could you please give the code which would do both distinct and pagination, doing only one of them is working fine but not both of them.
-
szhem about 12 yearstake a look at update. note that you have to paginate the IDs, not the resulting entity.
-
Ashish about 12 yearsIn ur code, when u do Criteria criteria = session.createCriteria(A.class) and then criteria.list(), won't that return duplicate results because of eager fetching? That's the problem I am facing.
-
Ashish about 12 yearsThanks Arnout. But y do u think this won't have duplicates. My problem is that even for a small query like session.createCriteria(A.class).list(), it is returning me duplicates because of eager fetching. How does your solution prevent eager fetching?
-
szhem about 12 yearsApply then result transformer as I mentioned earlier.
-
Ashish about 12 yearsyes distinct works fine, but the problem is that distinct and pagination does not work together, So if you create just one query and apply distinct and pagination simultaneously, it will apply pagination first and then distinct (not what anyone would want). And when I make a subquery to get distinct Ids and then apply pagination in the outer query, still I get duplicates in outer query (bcz of eager fetching), which ruins the distinct Ids I got from subquery
-
Ashish about 12 yearsIn your code, I think you are applying pagination in the subquery, if I do that then I won't get distinct A.ids
-
szhem about 12 years@Ashish, could you show your code and provide the information what exactly does not work? The strategy to paginate the root entities with one-to-many relationship is the following: 1. select only IDs of root entities, apply filtering, ordering, pagination, etc. to them; 2. select root entities using Restrictions.in("id", ids) and applying the same ordering to them. That's all. If you still have problems, then show your code for more details.
-
Ashish about 12 yearsin step 1, what if you need to add filtering on entity B as well, then you can't select only A records in subquery, u need to have a join!
-
Ashish about 12 yearsin your code, the method "getIDsOfAs" will give duplicate ids if you do eager fetching, just try it out.
-
szhem about 12 yearsIf you need to omit duplicate IDs use Projections.distinct(Projections.id()). Just read the documentation.
-
Ashish about 12 yearsIf you would do that, then it will apply pagination and then will do distinct, which will not be what anyone would want.
-
Ashish about 12 yearsI guess you are not getting the problem, just populate some objects the way I have, enable eager fetching and then try to apply pagination and distinct, you won't be able to.
-
szhem about 12 years@Ashish if you are interested in resolving this issue, please provide the test case or something else. you even haven't shown any mappings and queries. My solution works pretty well for the 1:M relations with eager fetching.
-
Arnout Engelen about 12 yearsAh, yes: first off really evaluate whether you really need eager fetching. In my experience it is hardly ever a good idea to always fetch certain collections eagerly - it can be useful to fetch them eagerly in certain specific queries, of course, and the Criteria API allows you to specify that. If you're sure you want to always fetch this collection eagerly, use the
Criteria.DISTINCT_ROOT_ENTITY
ResultTransformer
on the outer Criteria (paginate
). -
che javara about 7 yearsThis is flawed, you don't have pagination on your second query you will do a full table query without stopping after finding X results (Which is the whole point of pagination). Basically you are trading pagination for a query with IDs restriction. You will still pull many more duplicate rows that are removed in memory, Will only work for small tables where you don't actually need pagination for performance.
-
szhem about 7 years@chejavara, do you understand the difference between paginating jdbc resultset, and hibernate entities? so, what about one-to-many associations? are you familiar with dialects and criteria api? ... and please read the original question and try to understand the problem and then provide your much better solution for the question asked 5 years ago.