Find all records which have a count of an association greater than zero

76,457

Solution 1

joins uses an inner join by default so using Project.joins(:vacancies) will in effect only return projects that have an associated vacancy.

UPDATE:

As pointed out by @mackskatz in the comment, without a group clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use

Project.joins(:vacancies).group('projects.id')

UPDATE:

As pointed out by @Tolsee, you can also use distinct.

Project.joins(:vacancies).distinct

As an example

[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""

Solution 2

1) To get Projects with at least 1 vacancy:

Project.joins(:vacancies).group('projects.id')

2) To get Projects with more than 1 vacancy:

Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')

3) Or, if Vacancy model sets counter cache:

belongs_to :project, counter_cache: true

then this will work, too:

Project.where('vacancies_count > ?', 1)

Inflection rule for vacancy may need to be specified manually?

Solution 3

Yeah, vacancies is not a field in the join. I believe you want:

Project.joins(:vacancies).group("projects.id").having("count(vacancies.id)>0")

Solution 4

# None
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 0')
# Any
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 0')
# One
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 1')
# More than 1
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 1')

Solution 5

Performing an inner join to the has_many table combined with a group or uniq is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS with a correlated subquery.

This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.

That's not as straightforward in Rails, but can be achieved with:

Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)

Similarly, find all projects that have no vacancies:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)

Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists being delegated to arel. Fix this with:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)

Edit: if you're uncomfortable with raw SQL, try:

Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)

You can make this less messy by adding class methods to hide the use of arel_table, for example:

class Project
  def self.id_column
    arel_table[:id]
  end
end

... so ...

Project.where.not(
  Vacancies.where(
    Vacancy.project_id_column.eq(Project.id_column)
  ).arel.exists
)
Share:
76,457

Related videos on Youtube

jphorta
Author by

jphorta

Updated on August 06, 2020

Comments

  • jphorta
    jphorta over 3 years

    I'm trying to do something that I thought it would be simple but it seems not to be.

    I have a project model that has many vacancies.

    class Project < ActiveRecord::Base
    
      has_many :vacancies, :dependent => :destroy
    
    end
    

    I want to get all the projects that have at least 1 vacancy. I tried something like this:

    Project.joins(:vacancies).where('count(vacancies) > 0')
    

    but it says

    SQLite3::SQLException: no such column: vacancies: SELECT "projects".* FROM "projects" INNER JOIN "vacancies" ON "vacancies"."project_id" = "projects"."id" WHERE ("projects"."deleted_at" IS NULL) AND (count(vacancies) > 0).

  • Kamil Lelonek
    Kamil Lelonek over 9 years
    aggregate functions are not allowed in WHERE
  • mackshkatz
    mackshkatz over 7 years
    However, without applying a group by clause this would return multiple Project objects for Projects that have more than one Vacancy.
  • YasirAzgar
    YasirAzgar about 6 years
    This is much slower than join and group method, as the 'select count(*)..' subquery will execute for each projects.
  • Keith Mattix
    Keith Mattix almost 6 years
    Should this not be Project.joins(:vacancies).group('projects.id').having('count‌​(vacancies.id) > 1')? Querying the number of vacancies instead of the project ids
  • David Aldridge
    David Aldridge almost 5 years
    Doesn't generate an efficient SQL statement, though.
  • jvnill
    jvnill almost 5 years
    Well that is Rails for you. If you can provide an sql answer (and explain why this isn't efficient), that may be a lot more helpful.
  • Metaphysiker
    Metaphysiker over 4 years
    It's also possible to use "distinct", e.g. Project.joins(:vacancies).distinct
  • Yuri Karpovich
    Yuri Karpovich over 4 years
    You are right! It's better to use #distinct instead of #uniq. #uniq will load all objects into memory, but #distinct will do calculations on a database side.
  • Les Nightingill
    Les Nightingill over 4 years
    these two suggestions don't seem to work... the subquery Vacancy.where("vacancies.project_id = projects.id").exists? yields either true or false. Project.where(true) is an ArgumentError.
  • David Aldridge
    David Aldridge over 4 years
    Vacancy.where("vacancies.project_id = projects.id").exists? is not going to execute – it will raise an error because the projects relation won't exist in the query (and there's no question mark in the sample code above either). So decomposing this into two expressions isn't valid and doesn't work. In recent Rails Project.where(Vacancies.where("vacancies.project_id = projects.id").exists) raises a deprecation warning ... I'll update the question.
  • Tolsee
    Tolsee over 4 years
    What do you think about Project.joins(:vacancies).distinct?
  • jvnill
    jvnill over 4 years
    @Tolsee yes that would work too. will update the answer
  • Tolsee
    Tolsee over 4 years
    It's @Tolsee btw :D
  • Arta
    Arta about 4 years
    No, @KeithMattix, it should not be. It can be, however, if it reads better to you; it's a matter of preference. The count can be done with any field in the join table that is guaranteed to have a value in every row. Most meaningful candidates are projects.id, project_id, and vacancies.id. I chose to count project_id because it is the field on which the join is made; the spine of the join if you will. It also reminds me that this is a join table.
  • David Aldridge
    David Aldridge over 3 years
    @YasirAzgar The join and group method is slower than the "exists" method because it will still access all the child rows, even if there are a million of them.