Find all records which have a count of an association greater than zero
Solution 1
joins
uses an inner join by default so using Project.joins(:vacancies)
will in effect only return projects that have an associated vacancy.
UPDATE:
As pointed out by @mackskatz in the comment, without a group
clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use
Project.joins(:vacancies).group('projects.id')
UPDATE:
As pointed out by @Tolsee, you can also use distinct
.
Project.joins(:vacancies).distinct
As an example
[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""
Solution 2
1) To get Projects with at least 1 vacancy:
Project.joins(:vacancies).group('projects.id')
2) To get Projects with more than 1 vacancy:
Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')
3) Or, if Vacancy
model sets counter cache:
belongs_to :project, counter_cache: true
then this will work, too:
Project.where('vacancies_count > ?', 1)
Inflection rule for vacancy
may need to be specified manually?
Solution 3
Yeah, vacancies
is not a field in the join. I believe you want:
Project.joins(:vacancies).group("projects.id").having("count(vacancies.id)>0")
Solution 4
# None
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 0')
# Any
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 0')
# One
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 1')
# More than 1
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 1')
Solution 5
Performing an inner join to the has_many table combined with a group
or uniq
is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS
with a correlated subquery.
This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.
That's not as straightforward in Rails, but can be achieved with:
Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)
Similarly, find all projects that have no vacancies:
Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)
Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists
being delegated to arel. Fix this with:
Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)
Edit: if you're uncomfortable with raw SQL, try:
Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)
You can make this less messy by adding class methods to hide the use of arel_table
, for example:
class Project
def self.id_column
arel_table[:id]
end
end
... so ...
Project.where.not(
Vacancies.where(
Vacancy.project_id_column.eq(Project.id_column)
).arel.exists
)
Related videos on Youtube
jphorta
Updated on August 06, 2020Comments
-
jphorta over 3 years
I'm trying to do something that I thought it would be simple but it seems not to be.
I have a project model that has many vacancies.
class Project < ActiveRecord::Base has_many :vacancies, :dependent => :destroy end
I want to get all the projects that have at least 1 vacancy. I tried something like this:
Project.joins(:vacancies).where('count(vacancies) > 0')
but it says
SQLite3::SQLException: no such column: vacancies: SELECT "projects".* FROM "projects" INNER JOIN "vacancies" ON "vacancies"."project_id" = "projects"."id" WHERE ("projects"."deleted_at" IS NULL) AND (count(vacancies) > 0)
. -
Kamil Lelonek over 9 years
aggregate functions are not allowed in WHERE
-
mackshkatz over 7 yearsHowever, without applying a group by clause this would return multiple Project objects for Projects that have more than one Vacancy.
-
YasirAzgar about 6 yearsThis is much slower than join and group method, as the 'select count(*)..' subquery will execute for each projects.
-
Keith Mattix almost 6 yearsShould this not be
Project.joins(:vacancies).group('projects.id').having('count(vacancies.id) > 1')
? Querying the number of vacancies instead of the project ids -
David Aldridge almost 5 yearsDoesn't generate an efficient SQL statement, though.
-
jvnill almost 5 yearsWell that is Rails for you. If you can provide an sql answer (and explain why this isn't efficient), that may be a lot more helpful.
-
Metaphysiker over 4 yearsIt's also possible to use "distinct", e.g. Project.joins(:vacancies).distinct
-
Yuri Karpovich over 4 yearsYou are right! It's better to use #distinct instead of #uniq. #uniq will load all objects into memory, but #distinct will do calculations on a database side.
-
Les Nightingill over 4 yearsthese two suggestions don't seem to work... the subquery
Vacancy.where("vacancies.project_id = projects.id").exists?
yields eithertrue
orfalse
.Project.where(true)
is anArgumentError
. -
David Aldridge over 4 years
Vacancy.where("vacancies.project_id = projects.id").exists?
is not going to execute – it will raise an error because theprojects
relation won't exist in the query (and there's no question mark in the sample code above either). So decomposing this into two expressions isn't valid and doesn't work. In recent RailsProject.where(Vacancies.where("vacancies.project_id = projects.id").exists)
raises a deprecation warning ... I'll update the question. -
Tolsee over 4 yearsWhat do you think about
Project.joins(:vacancies).distinct
? -
jvnill over 4 years@Tolsee yes that would work too. will update the answer
-
Tolsee over 4 yearsIt's @Tolsee btw :D
-
Arta about 4 yearsNo, @KeithMattix, it should not be. It can be, however, if it reads better to you; it's a matter of preference. The count can be done with any field in the join table that is guaranteed to have a value in every row. Most meaningful candidates are
projects.id
,project_id
, andvacancies.id
. I chose to countproject_id
because it is the field on which the join is made; the spine of the join if you will. It also reminds me that this is a join table. -
David Aldridge over 3 years@YasirAzgar The join and group method is slower than the "exists" method because it will still access all the child rows, even if there are a million of them.