pipe commands inside find -exec?
Solution 1
If you must do it from within find, you need to call a shell:
find ./ -type f -name "*.txt" -exec sh -c 'grep -EiH something "$1" | grep -E somethingelse | grep -E other' sh {} \;
Other alternatives include using xargs
instead:
find ./ -type f -name "*.txt" |
xargs -I{} grep -EiH something {} |
grep -EiH somethingelse |
grep -EiH other
Or, much safer for arbitrary filenames (assuming your find
supports -print0
):
find ./ -type f -name "*.txt" -print0 |
xargs -0 grep -EiH something {} |
grep -Ei somethingelse |
grep -Ei other
Or, you could just use a shell loop instead:
find ./ -type f -name "*.txt" -print0 |
while IFS= read -d '' file; do
grep -Ei something "$file" |
grep -Ei somethingelse |
grep -Ei other
done
Solution 2
Edit: This answer is not preferred, but is left here for comparison and illustration of potentially dangerous pitfalls in bash scripting.
You can put bash
(or another shell) as your -exec
command:
find -type -f -name "*.txt" -exec bash -c 'egrep -iH something "{}" | egrep somethingelse | egrep other' \;
One of the downsides of doing it this way is that it creates more potential for nested quoting issues as your commands get more complex. If you want to avoid that, you can break it out into a for
-loop:
for i in $(find -type -f -name "*.txt"); do
if egrep -iH something "$i" | egrep somethingelse | egrep other; then
echo "Found something: $i"
fi
done
Related videos on Youtube
monkey intern
Updated on September 18, 2022Comments
-
monkey intern over 1 year
I have a ternary relationship, called
ternary
like this:id_Offer - id_Profile - id_Skill 1 - 1 - 1 1 - 2 - 1 [and so on, there would be more registers for each id_Offer from Offer but I want to limit the example]
The table Profile looks something like this (profile_interest is a table that stablish the relationship between profile and interest, that's all):
id_Profile - profile_name 1 - profile-1 2 - profile-2 3 - profile-3
So when I make the following query, the more OR clauses I add the worse the query performs, starting at ~0.1-0.2 seconds, which is the what I get for any other query I make, and up to 1.5 seconds.
SELECT DISTINCT ternary_table.id_profile, COUNT(distinct profile_interest.id_interest) as matching FROM ternary_table INNER JOIN profile ON ternary_table.id_profile=profile.id_profile INNER JOIN profile_interest ON profile.id_profile=profile_interest.id_profile WHERE profile_interest.id_interest= '1' OR profile_interest.id_interest = '2' OR profile_interest.id_interest = '3' OR profile_interest.id_interest = '14' OR profile_interest.id_interest = '15' OR profile_interest.id_interest = '16' GROUP BY(ternary_table.id_profile) ORDER BY matching DESC;
I have tried making the field profile_interest.id_interest an indexed column with:
CREATE INDEX filter_interest ON profile_interest(id_interest );
With no improvement whatsoever. The database weighs less than a Gigabyte, is a very small database with ~15 tables, so I would like to know if there is any way to shorten the query lag.
Edit: To add more information, the reason I am worried about this is because the only purpose of this data is to connect to an API so any delay in the SQL will delay every call to this data.
Edit1: Added EXPLAIN output and removed first distinct since it's unnecessary
+----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+ | 1 | SIMPLE | profile_interest | NULL | range | PRIMARY,id_interest,filter_interest | id_interest | 202 | NULL | 40 | 100.00 | Using where; Using index; Using temporary; Using filesort | | 1 | SIMPLE | perfil | NULL | eq_ref | PRIMARY | PRIMARY | 202 | BBDD.profile_interest.id_perfil | 1 | 100.00 | Using index | | 1 | SIMPLE | oferta_skill_perfil | NULL | ref | PRIMARY,id_skill,id_perfil | id_perfil | 202 | BBDD.profile_interest.id_perfil | 4609 | 100.00 | Using index | +----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+
Edit 2: Added table creation per request
SET FOREIGN_KEY_CHECKS=1; CREATE TABLE profile ( id_profile VARCHAR(200) NOT NULL, name_profile VARCHAR(200), type_profile VARCHAR(200), PRIMARY KEY (id_profile) ); CREATE TABLE ternary ( id_oferta VARCHAR(200) NOT NULL, id_skill VARCHAR(200) NOT NULL, id_profile VARCHAR(200) NOT NULL, ranking_skill DOUBLE NOT NULL, PRIMARY KEY (id_oferta, id_skill, id_profile), FOREIGN KEY (id_oferta) REFERENCES oferta(id_oferta), FOREIGN KEY (id_skill) REFERENCES skill(id_skill), FOREIGN KEY (id_profile) REFERENCES profile(id_profile) ); CREATE TABLE interest ( id_interest VARCHAR(200) NOT NULL, name_interes VARCHAR(200), PRIMARY KEY (id_interest) ); CREATE TABLE profile_interest ( id_profile VARCHAR(200) NOT NULL, id_interest VARCHAR(200) NOT NULL, PRIMARY KEY (id_profile, id_interest), FOREIGN KEY (id_profile) REFERENCES profile(id_profile), FOREIGN KEY (id_interest) REFERENCES interes(id_interest) );
-
Error_2646 almost 6 yearsWhen you look at the explain plan, does the index actually get used? Also, just a syntax thing - use IN ('1','2', ...,'16') rather than sequential OR's.
-
Strawberry almost 6 yearsThe first DISTINCT is redundant.
-
dnoeth almost 6 yearsYour
or
ed conditions can be simplified toprofile_interest.id_interest in ('1' , '2', ...)
(this should not change performance), The firstdistinct
can be removed because thegroup by
is already returning unique rows. Thecount(distinct)
can probably be replaced by a simplecount
-
monkey intern almost 6 years@Error_2646 I have tried looking the EXPLAIN output since it is often recommended but I am not going to lie, I do not make a lot of sense out of it, I will edit main post with its outcome in case someone else can see more than I do
-
Denis Jr almost 6 yearsavoid using "or" in the "where" clause
-
monkey intern almost 6 yearsAny tips as to why @DenisJr ? I am quite new to this, just learned about how to use the IN syntax
-
dnoeth almost 6 yearsWhy do you have Primary Keys with exactly the same data type as the other columns?
VARCHAR(200)
for storing numeric data is bad and the joins probably need more resources. -
Rick James almost 6 years@user20929302 - You are unlikely to be any faster with a manually generated temp table, than to let MySQL do its own thing.
-
Rick James almost 6 years@monkeyintern - No fair! The indexes for
profile_interest
are different in theEXPLAIN
than in theCREATE TABLE
!
-
-
user20929302 almost 6 yearsmaybe try inserting everything into a temp table, then query select * from my_temp_table where conditions
-
monkey intern almost 6 yearsJust added EXPLAIN output, I will look into editing the Create Tables so as to not leak anything relevant
-
monkey intern almost 6 yearsI still need the INNER JOIN because of what I am required to return (which are columns of two different tables in this case), and I did'nt know about the IN syntax so that's something I've already learned, thanks for the answer! Do not know why the IDs work both with \' and without the quotes, but yeah, they are IDs, weird for me that both work
-
monkey intern almost 6 yearsJust added the Create tables. And just now realized there is a command that you meant to use THE command. I hope it's enough since it's pretty much the same info :/
-
Strawberry almost 6 years@monkeyintern It's not the same, because you appear to add indexes after creating the tables. We need to see the tables as they appeared when the query was executed.
-
monkey intern almost 6 yearsThanks for the help!! That is very interesting since it is returning me every result with a 0 (the original query did not do that, I don't even know how to do that!) but the really important thing is that if I add a GROUP BY it takes 0.00 seconds!! How would I go about not taking into account results where matching equals 0? Basically I do not understand why originally they were ommited and now they show up, on a theorical level.
-
dnoeth almost 6 yearsWhen you compare a string to a numeric value the string is automatically (in some DBMSes) casted to a numeric value. You can use the result of this Select as a Derived Table to join to additional tables.
-
Gordon Linoff almost 6 years@monkeyintern . . . If it is returning
0
, then check in thein
list and the correlation condition. -
dnoeth almost 6 years@monkeyintern: A Correlated Scalar Subquery always returns a row (even if no rows with those interest_ids exist) and in your case zero due to the COUNT.
-
monkey intern almost 6 yearsI am learning more SQL today than ever before. @dnoeth Since that correlated scalar subquery is done for a column returned, and not in a WHERE clause for example, how would I only get results above 0? What does the syntax look like for this CSS?
-
dnoeth almost 6 years@monkeyintern: Well, you could use the CSS in the WHERE-clause, but to get value in the Select list and filter for
>0
you need to nest the Select in a Derived Table:select * from (select tt.id_profile,... from ternary_table tt) as dt where matching > 0
-
Rick James almost 6 years@DenisJr - This type of
OR
is equivalent toIN
, so it is not as bad as other types ofOR
. -
Rick James almost 6 years@monkeyintern - If Gordon's query does not give the right results, it should not be the approved answer. You can move the checkmark to another Answer. (Unless he fixes it.)
-
monkey intern almost 6 yearsIt gives such a great performance, and with only a few modifications and @dnoeth added help I got exactly what I wanted. I thought about editing the answer to add the full query once completed but I was unsure about if it was the best way to proceed
-
Gordon Linoff almost 6 years@monkeyintern . . . The best way to proceed is to edit the answer so it is correct -- for a minor edit. If the question merely inspired an approach, you can also answer the question yourself. If the issue is just filtering, then the modification is pretty trivial.
-
Foreign about 5 yearsThe first one is exactly what I was looking for. Extremely simple and small enough to type depending on my needs. Thanks.
-
Foreign about 5 yearsThe first one is exactly what I was looking for. Extremely simple and small enough to type depending on my needs. Thanks.
-
terdon about 5 yearsThat
for
loop is a very bad idea.Also known as bash pitfall #1. -
Kusalananda about 5 years... and
xargs
could also be used asxargs -I {} sh -c '...' sh {}
, if one wanted to (it makes it possible to run parallel jobs with-P
if one wanted to). -
Kamil Maciorowski about 5 yearsThis
"{}"
in your first command may even lead to code injection. Imagine you got files from me and there's a file literally named" & rm -rf ~ & : ".txt
. Luckily for you-type -f
is invalid, it just saved your home directory. Fix the typo and try again. :) terdon did it right:find … -exec sh -c '… "$1" …' foo {} \;
. -
trobinson about 5 yearsThanks for the information! Yeah, the
-type -f
is a typo I make constantly when usingfind
, and I didn't notice it in my answer. Whoops. terdon's answer is better, but I'll leave this for comparative purposes. -
Cbhihe about 5 years@terdon: tx for referencing the mywiki.wooledge.org page. It's nice to have a bunch of GPs neatly summarized in one place.
-
Cbhihe about 5 yearsTo improve site readability & ease of use, e.g. by new users, do consider at least adding a note at the top of yr post (by editing it) to signal the fact that yr
for ... done
loop-based answer is not preferred and left for reason "XYZ".... If not unwitting visitors might credit that particular post with upvotes (as seen already). I've seen many SE visitors not read the fine prints (i.e. the comments following an answer). They just stop at the nbr of upvotes. Upvotes are often seen as a seal of approval by some, who tend to accept whatever has apparently been positively sanctioned. -
Boson Bear about 2 yearsI find the first one working for me but I'm confused why there's an extra
sh
in the very end as in... sh {} \;
. Would you mind clarifying? @terdon -
terdon about 2 years@BosonBear because that will become $0 in the sh script, you can use any arbitrary string. See unix.stackexchange.com/a/389706/22222