Hibernate - distinct results with pagination

10,323

Solution 1

Consider using DistinctRootEntity result transformer like this

session.createCriteria(A.class)
    .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY).list();

UPDATE

The samples of queries for one-to-many associations.

public Collection<Long> getIDsOfAs(int pageNumber, int pageSize) {
    Session session = getCurrentSession();

    Criteria criteria = session.createCriteria(A.class)
        .setProjection(Projections.id())
        .addOrder(Order.asc("id"));

    if(pageNumber >= 0 && pageSize > 0) {
        criteria.setMaxResults(pageSize);
        criteria.setFirstResult(pageNumber * pageSize);
    }

    @SuppressWarnings("unchecked")
    Collection<Long> ids = criteria.list();
    return ids;
}

public Collection<A> getAs(int pageNumber, int pageSize) {
    Collection<A> as = Collections.emptyList();

    Collection<Long> ids = getIDsOfAs(pageNumber, pageSize);
    if(!ids.isEmpty()) {
        Session session = getCurrentSession();

        Criteria criteria = session.createCriteria(A.class)
            .add(Restrictions.in("id", ids))
            .addOrder(Order.asc("id"))
            .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
        @SuppressWarnings("unchecked")
        as = criteria.list(); 
    }    

    return as;
}

Solution 2

You mention the reason you're seeing this problem is because Set<B> is fetched eagerly. If you're paginating, chances are you don't need the B's for each A, so it might be better to fetch them lazily.

However, this same problem occurs when you join the B's into the query to make a selection.

In some cases, you will not only want to paginate, but also sort on other fields than the ID. I think the way to do this is to formulate the query like this:

  Criteria filter = session.createCriteria(A.class)
    .add(... any criteria you want to filter on, including aliases etc ...);
  filter.setProjection(Projections.id());

  Criteria paginate = session.createCriteria(A.class)
    .add(Subqueries.in("id", filter))
    .addOrder(Order.desc("foo"))
    .setMaxResults(max)
    .setFirstResult(first);

  return paginate.list();

(pseudocode, didn't check if the syntax is exactly right but you get the idea)

Share:
10,323
Ashish
Author by

Ashish

Updated on June 20, 2022

Comments

  • Ashish
    Ashish almost 2 years

    This seems to be a well known problem for years as can be read here: http://blog.xebia.com/2008/12/11/sorting-and-pagination-with-hibernate-criteria-how-it-can-go-wrong-with-joins/

    And even finds reference in hibernate faqs:

    https://community.jboss.org/wiki/HibernateFAQ-AdvancedProblems#Hibernate_does_not_return_distinct_results_for_a_query_with_outer_join_fetching_enabled_for_a_collection_even_if_I_use_the_distinct_keyword

    This has also been discussed previously on SO

    How to get distinct results in hibernate with joins and row-based limiting (paging)?

    The problem is that even after going through all these resources, I have not been able to resolve my issue, which seems to be a little different from this standard problem, although I am not sure.

    The standard solution proposed here involves creating two queries, first one for getting distinct IDs and then using those in a higher level query to get the desired pagination. The hibernate classes in my case are something like

    A
     - aId
     - Set<B>
    
    B
     - bId 
    

    It appears to me that the subquery seems to be working fine for me and is being able to get the distinct aIds but the outer query which is supposed to do the pagination is again fetching the duplicates and thus the distinct in subquery is having no effect.

    Assuming I have one A object which has a set of four B objects, My analysis is that because of introduction of set, while fetching data for

    session.createCriteria(A.class).list();
    

    hibernate is populating four references in the list pointing to just one object. Because of this the standard solution is failing for me.

    Could someone please help in coming up with a solution for this case?

    Edit: I have decided to go for doing pagination by ourselves from the distinct resultset. The other equally bad way could have been to lazy load the B objects but that would have required separate queries for all the A objects to fetch corresponding B objects

  • Ashish
    Ashish about 12 years
    I have used that as well, in the subquery. My subquery is working fine and is returning unique IDs. The problem is that the outer query is fetching duplicates. For example if subquery returns A.ids = 1,2,3, then the main query (which is used for pagination), fetches duplicate A.ids like 1,1,1,2,2,3,3,3 and therefore the complete query of the form Select A.* from A where A.id in (subquery) limit 10; fails to get me unique A records
  • szhem
    szhem about 12 years
    You final query should look like this: select a.* from a where a.id in (:ids), and query to get identifiers of A should look like this: select a.id from A limit 10. So apply Limit only to identifiers not to the final query.
  • Ashish
    Ashish about 12 years
    my final query is of the form select a.* from a left outer join b on a.id=b.aid and this is required to have the complete A object.
  • Ashish
    Ashish about 12 years
    given a structure of the form given above in terms of A and B, could you please give the code which would do both distinct and pagination, doing only one of them is working fine but not both of them.
  • szhem
    szhem about 12 years
    take a look at update. note that you have to paginate the IDs, not the resulting entity.
  • Ashish
    Ashish about 12 years
    In ur code, when u do Criteria criteria = session.createCriteria(A.class) and then criteria.list(), won't that return duplicate results because of eager fetching? That's the problem I am facing.
  • Ashish
    Ashish about 12 years
    Thanks Arnout. But y do u think this won't have duplicates. My problem is that even for a small query like session.createCriteria(A.class).list(), it is returning me duplicates because of eager fetching. How does your solution prevent eager fetching?
  • szhem
    szhem about 12 years
    Apply then result transformer as I mentioned earlier.
  • Ashish
    Ashish about 12 years
    yes distinct works fine, but the problem is that distinct and pagination does not work together, So if you create just one query and apply distinct and pagination simultaneously, it will apply pagination first and then distinct (not what anyone would want). And when I make a subquery to get distinct Ids and then apply pagination in the outer query, still I get duplicates in outer query (bcz of eager fetching), which ruins the distinct Ids I got from subquery
  • Ashish
    Ashish about 12 years
    In your code, I think you are applying pagination in the subquery, if I do that then I won't get distinct A.ids
  • szhem
    szhem about 12 years
    @Ashish, could you show your code and provide the information what exactly does not work? The strategy to paginate the root entities with one-to-many relationship is the following: 1. select only IDs of root entities, apply filtering, ordering, pagination, etc. to them; 2. select root entities using Restrictions.in("id", ids) and applying the same ordering to them. That's all. If you still have problems, then show your code for more details.
  • Ashish
    Ashish about 12 years
    in step 1, what if you need to add filtering on entity B as well, then you can't select only A records in subquery, u need to have a join!
  • Ashish
    Ashish about 12 years
    in your code, the method "getIDsOfAs" will give duplicate ids if you do eager fetching, just try it out.
  • szhem
    szhem about 12 years
    If you need to omit duplicate IDs use Projections.distinct(Projections.id()). Just read the documentation.
  • Ashish
    Ashish about 12 years
    If you would do that, then it will apply pagination and then will do distinct, which will not be what anyone would want.
  • Ashish
    Ashish about 12 years
    I guess you are not getting the problem, just populate some objects the way I have, enable eager fetching and then try to apply pagination and distinct, you won't be able to.
  • szhem
    szhem about 12 years
    @Ashish if you are interested in resolving this issue, please provide the test case or something else. you even haven't shown any mappings and queries. My solution works pretty well for the 1:M relations with eager fetching.
  • Arnout Engelen
    Arnout Engelen about 12 years
    Ah, yes: first off really evaluate whether you really need eager fetching. In my experience it is hardly ever a good idea to always fetch certain collections eagerly - it can be useful to fetch them eagerly in certain specific queries, of course, and the Criteria API allows you to specify that. If you're sure you want to always fetch this collection eagerly, use the Criteria.DISTINCT_ROOT_ENTITY ResultTransformer on the outer Criteria (paginate).
  • che javara
    che javara about 7 years
    This is flawed, you don't have pagination on your second query you will do a full table query without stopping after finding X results (Which is the whole point of pagination). Basically you are trading pagination for a query with IDs restriction. You will still pull many more duplicate rows that are removed in memory, Will only work for small tables where you don't actually need pagination for performance.
  • szhem
    szhem about 7 years
    @chejavara, do you understand the difference between paginating jdbc resultset, and hibernate entities? so, what about one-to-many associations? are you familiar with dialects and criteria api? ... and please read the original question and try to understand the problem and then provide your much better solution for the question asked 5 years ago.