Find all records which have a count of an association greater than zero

sql ruby-on-rails ruby-on-rails-3 activerecord

76,457

Solution 1

joins uses an inner join by default so using Project.joins(:vacancies) will in effect only return projects that have an associated vacancy.

UPDATE:

As pointed out by @mackskatz in the comment, without a group clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use

Project.joins(:vacancies).group('projects.id')

UPDATE:

As pointed out by @Tolsee, you can also use distinct.

Project.joins(:vacancies).distinct

As an example

[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""

Solution 2

1) To get Projects with at least 1 vacancy:

Project.joins(:vacancies).group('projects.id')

2) To get Projects with more than 1 vacancy:

Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')

3) Or, if Vacancy model sets counter cache:

belongs_to :project, counter_cache: true

then this will work, too:

Project.where('vacancies_count > ?', 1)

Inflection rule for vacancy may need to be specified manually?

Solution 3

Yeah, vacancies is not a field in the join. I believe you want:

Project.joins(:vacancies).group("projects.id").having("count(vacancies.id)>0")

Solution 4

# None
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 0')
# Any
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 0')
# One
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 1')
# More than 1
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 1')

Solution 5

Performing an inner join to the has_many table combined with a group or uniq is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS with a correlated subquery.

This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.

That's not as straightforward in Rails, but can be achieved with:

Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)

Similarly, find all projects that have no vacancies:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)

Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists being delegated to arel. Fix this with:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)

Edit: if you're uncomfortable with raw SQL, try:

Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)

You can make this less messy by adding class methods to hide the use of arel_table, for example:

class Project
  def self.id_column
    arel_table[:id]
  end
end

... so ...

Project.where.not(
  Vacancies.where(
    Vacancy.project_id_column.eq(Project.id_column)
  ).arel.exists
)

View more solutions

76,457

jphorta

Updated on August 06, 2020

Comments

jphorta over 3 years
I'm trying to do something that I thought it would be simple but it seems not to be.

I have a project model that has many vacancies.
```
class Project < ActiveRecord::Base

  has_many :vacancies, :dependent => :destroy

end
```
I want to get all the projects that have at least 1 vacancy. I tried something like this:
```
Project.joins(:vacancies).where('count(vacancies) > 0')
```
but it says

SQLite3::SQLException: no such column: vacancies: SELECT "projects".* FROM "projects" INNER JOIN "vacancies" ON "vacancies"."project_id" = "projects"."id" WHERE ("projects"."deleted_at" IS NULL) AND (count(vacancies) > 0).
Kamil Lelonek over 9 years

aggregate functions are not allowed in WHERE
mackshkatz over 7 years

However, without applying a group by clause this would return multiple Project objects for Projects that have more than one Vacancy.
YasirAzgar about 6 years

This is much slower than join and group method, as the 'select count(*)..' subquery will execute for each projects.
Keith Mattix almost 6 years

Should this not be Project.joins(:vacancies).group('projects.id').having('count‌(vacancies.id) > 1')? Querying the number of vacancies instead of the project ids
David Aldridge almost 5 years

Doesn't generate an efficient SQL statement, though.
jvnill almost 5 years

Well that is Rails for you. If you can provide an sql answer (and explain why this isn't efficient), that may be a lot more helpful.
Metaphysiker over 4 years

It's also possible to use "distinct", e.g. Project.joins(:vacancies).distinct
Yuri Karpovich over 4 years

You are right! It's better to use #distinct instead of #uniq. #uniq will load all objects into memory, but #distinct will do calculations on a database side.
Les Nightingill over 4 years

these two suggestions don't seem to work... the subquery Vacancy.where("vacancies.project_id = projects.id").exists? yields either true or false. Project.where(true) is an ArgumentError.
David Aldridge over 4 years

Vacancy.where("vacancies.project_id = projects.id").exists? is not going to execute – it will raise an error because the projects relation won't exist in the query (and there's no question mark in the sample code above either). So decomposing this into two expressions isn't valid and doesn't work. In recent Rails Project.where(Vacancies.where("vacancies.project_id = projects.id").exists) raises a deprecation warning ... I'll update the question.
Tolsee over 4 years

What do you think about Project.joins(:vacancies).distinct?
jvnill over 4 years

@Tolsee yes that would work too. will update the answer
Tolsee over 4 years

It's @Tolsee btw :D
Arta about 4 years

No, @KeithMattix, it should not be. It can be, however, if it reads better to you; it's a matter of preference. The count can be done with any field in the join table that is guaranteed to have a value in every row. Most meaningful candidates are projects.id, project_id, and vacancies.id. I chose to count project_id because it is the field on which the join is made; the spine of the join if you will. It also reminds me that this is a join table.
David Aldridge over 3 years

@YasirAzgar The join and group method is slower than the "exists" method because it will still access all the child rows, even if there are a million of them.