pipe commands inside find -exec?

shell find pipe

622

Solution 1

If you must do it from within find, you need to call a shell:

find ./ -type f -name "*.txt" -exec sh -c 'grep -EiH something "$1" | grep -E somethingelse | grep -E other' sh {} \;

Other alternatives include using xargs instead:

find ./ -type f -name "*.txt" | 
    xargs -I{} grep -EiH something {} | 
        grep -EiH somethingelse | 
            grep -EiH other

Or, much safer for arbitrary filenames (assuming your find supports -print0):

find ./ -type f -name "*.txt" -print0 | 
    xargs -0 grep -EiH something {} | 
        grep -Ei somethingelse | 
            grep -Ei other

Or, you could just use a shell loop instead:

find ./ -type f -name "*.txt" -print0 | 
    while IFS= read -d '' file; do 
        grep -Ei something "$file" | 
            grep -Ei somethingelse | 
                grep -Ei other
    done

Solution 2

Edit: This answer is not preferred, but is left here for comparison and illustration of potentially dangerous pitfalls in bash scripting.

You can put bash (or another shell) as your -exec command:

find -type -f -name "*.txt" -exec bash -c 'egrep -iH something "{}" | egrep somethingelse | egrep other' \;

One of the downsides of doing it this way is that it creates more potential for nested quoting issues as your commands get more complex. If you want to avoid that, you can break it out into a for-loop:

for i in $(find -type -f -name "*.txt"); do
  if egrep -iH something "$i" | egrep somethingelse | egrep other; then 
    echo "Found something: $i"
  fi
done

622

monkey intern

Updated on September 18, 2022

Comments

monkey intern over 1 year

I have a ternary relationship, called ternary like this:

id_Offer    -   id_Profile  -   id_Skill
1           -   1           -   1
1           -   2           -   1

[and so on, there would be more registers for each id_Offer from Offer but I want to limit the example]

The table Profile looks something like this (profile_interest is a table that stablish the relationship between profile and interest, that's all):

id_Profile -   profile_name
1          -   profile-1
2          -   profile-2
3          -   profile-3

So when I make the following query, the more OR clauses I add the worse the query performs, starting at ~0.1-0.2 seconds, which is the what I get for any other query I make, and up to 1.5 seconds.

SELECT DISTINCT ternary_table.id_profile, COUNT(distinct profile_interest.id_interest) as matching 
FROM ternary_table INNER JOIN profile ON ternary_table.id_profile=profile.id_profile 
INNER JOIN profile_interest ON profile.id_profile=profile_interest.id_profile 
WHERE profile_interest.id_interest= '1' 
 OR profile_interest.id_interest = '2' 
 OR profile_interest.id_interest = '3'
 OR profile_interest.id_interest = '14'
 OR profile_interest.id_interest = '15'
 OR profile_interest.id_interest = '16'
GROUP BY(ternary_table.id_profile) 
ORDER BY matching DESC;

I have tried making the field profile_interest.id_interest an indexed column with:

CREATE INDEX filter_interest ON profile_interest(id_interest );

With no improvement whatsoever. The database weighs less than a Gigabyte, is a very small database with ~15 tables, so I would like to know if there is any way to shorten the query lag.

Edit: To add more information, the reason I am worried about this is because the only purpose of this data is to connect to an API so any delay in the SQL will delay every call to this data.

Edit1: Added EXPLAIN output and removed first distinct since it's unnecessary

+----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table               | partitions | type   | possible_keys                                  | key        | key_len | ref                                | rows | filtered | Extra                                                     |
+----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | profile_interest      | NULL       | range  | PRIMARY,id_interest,filter_interest | id_interest | 202 | NULL                               |   40 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | perfil              | NULL       | eq_ref | PRIMARY                                        | PRIMARY    | 202     | BBDD.profile_interest.id_perfil    |    1 |   100.00 | Using index                                               |
|  1 | SIMPLE      | oferta_skill_perfil | NULL       | ref    | PRIMARY,id_skill,id_perfil                     | id_perfil  | 202     | BBDD.profile_interest.id_perfil    | 4609 |   100.00 | Using index                                               |
+----+-------------+---------------------+------------+--------+------------------------------------------------+------------+---------+------------------------------------+------+----------+-----------------------------------------------------------+

Edit 2: Added table creation per request

SET FOREIGN_KEY_CHECKS=1;

CREATE TABLE profile (
    id_profile VARCHAR(200) NOT NULL,
    name_profile VARCHAR(200),
    type_profile VARCHAR(200),
    PRIMARY KEY (id_profile)
);


CREATE TABLE ternary (
    id_oferta VARCHAR(200) NOT NULL,
    id_skill VARCHAR(200) NOT NULL,
    id_profile VARCHAR(200) NOT NULL,
    ranking_skill DOUBLE NOT NULL,
    PRIMARY KEY (id_oferta, id_skill, id_profile),
    FOREIGN KEY (id_oferta) REFERENCES oferta(id_oferta),
    FOREIGN KEY (id_skill) REFERENCES skill(id_skill),
    FOREIGN KEY (id_profile) REFERENCES profile(id_profile)
);

 CREATE TABLE interest (
    id_interest VARCHAR(200) NOT NULL,
    name_interes VARCHAR(200),
    PRIMARY KEY (id_interest)
 );


CREATE TABLE profile_interest (
    id_profile VARCHAR(200) NOT NULL,
    id_interest VARCHAR(200) NOT NULL, 
    PRIMARY KEY (id_profile, id_interest),
    FOREIGN KEY (id_profile) REFERENCES profile(id_profile),
    FOREIGN KEY (id_interest) REFERENCES interes(id_interest)
);

Error_2646 almost 6 years

When you look at the explain plan, does the index actually get used? Also, just a syntax thing - use IN ('1','2', ...,'16') rather than sequential OR's.
Strawberry almost 6 years

The first DISTINCT is redundant.
dnoeth almost 6 years

Your ored conditions can be simplified to profile_interest.id_interest in ('1' , '2', ...) (this should not change performance), The first distinct can be removed because the group by is already returning unique rows. The count(distinct) can probably be replaced by a simple count
monkey intern almost 6 years

@Error_2646 I have tried looking the EXPLAIN output since it is often recommended but I am not going to lie, I do not make a lot of sense out of it, I will edit main post with its outcome in case someone else can see more than I do
Denis Jr almost 6 years

avoid using "or" in the "where" clause
monkey intern almost 6 years

Any tips as to why @DenisJr ? I am quite new to this, just learned about how to use the IN syntax
dnoeth almost 6 years

Why do you have Primary Keys with exactly the same data type as the other columns? VARCHAR(200) for storing numeric data is bad and the joins probably need more resources.
Rick James almost 6 years

@user20929302 - You are unlikely to be any faster with a manually generated temp table, than to let MySQL do its own thing.
Rick James almost 6 years

@monkeyintern - No fair! The indexes for profile_interest are different in the EXPLAIN than in the CREATE TABLE !

user20929302 almost 6 years

maybe try inserting everything into a temp table, then query select * from my_temp_table where conditions
monkey intern almost 6 years

Just added EXPLAIN output, I will look into editing the Create Tables so as to not leak anything relevant
monkey intern almost 6 years

I still need the INNER JOIN because of what I am required to return (which are columns of two different tables in this case), and I did'nt know about the IN syntax so that's something I've already learned, thanks for the answer! Do not know why the IDs work both with \' and without the quotes, but yeah, they are IDs, weird for me that both work
monkey intern almost 6 years

Just added the Create tables. And just now realized there is a command that you meant to use THE command. I hope it's enough since it's pretty much the same info :/
Strawberry almost 6 years

@monkeyintern It's not the same, because you appear to add indexes after creating the tables. We need to see the tables as they appeared when the query was executed.
monkey intern almost 6 years

Thanks for the help!! That is very interesting since it is returning me every result with a 0 (the original query did not do that, I don't even know how to do that!) but the really important thing is that if I add a GROUP BY it takes 0.00 seconds!! How would I go about not taking into account results where matching equals 0? Basically I do not understand why originally they were ommited and now they show up, on a theorical level.
dnoeth almost 6 years

When you compare a string to a numeric value the string is automatically (in some DBMSes) casted to a numeric value. You can use the result of this Select as a Derived Table to join to additional tables.
Gordon Linoff almost 6 years

@monkeyintern . . . If it is returning 0, then check in the in list and the correlation condition.
dnoeth almost 6 years

@monkeyintern: A Correlated Scalar Subquery always returns a row (even if no rows with those interest_ids exist) and in your case zero due to the COUNT.
monkey intern almost 6 years

I am learning more SQL today than ever before. @dnoeth Since that correlated scalar subquery is done for a column returned, and not in a WHERE clause for example, how would I only get results above 0? What does the syntax look like for this CSS?
dnoeth almost 6 years

@monkeyintern: Well, you could use the CSS in the WHERE-clause, but to get value in the Select list and filter for >0 you need to nest the Select in a Derived Table: select * from (select tt.id_profile,... from ternary_table tt) as dt where matching > 0
Rick James almost 6 years

@DenisJr - This type of OR is equivalent to IN, so it is not as bad as other types of OR.
Rick James almost 6 years

@monkeyintern - If Gordon's query does not give the right results, it should not be the approved answer. You can move the checkmark to another Answer. (Unless he fixes it.)
monkey intern almost 6 years

It gives such a great performance, and with only a few modifications and @dnoeth added help I got exactly what I wanted. I thought about editing the answer to add the full query once completed but I was unsure about if it was the best way to proceed
Gordon Linoff almost 6 years

@monkeyintern . . . The best way to proceed is to edit the answer so it is correct -- for a minor edit. If the question merely inspired an approach, you can also answer the question yourself. If the issue is just filtering, then the modification is pretty trivial.
Foreign about 5 years

The first one is exactly what I was looking for. Extremely simple and small enough to type depending on my needs. Thanks.
Foreign about 5 years

The first one is exactly what I was looking for. Extremely simple and small enough to type depending on my needs. Thanks.
terdon about 5 years

That for loop is a very bad idea.Also known as bash pitfall #1.
Kusalananda about 5 years

... and xargs could also be used as xargs -I {} sh -c '...' sh {}, if one wanted to (it makes it possible to run parallel jobs with -P if one wanted to).
Kamil Maciorowski about 5 years

This "{}" in your first command may even lead to code injection. Imagine you got files from me and there's a file literally named " & rm -rf ~ & : ".txt. Luckily for you -type -f is invalid, it just saved your home directory. Fix the typo and try again. :) terdon did it right: find … -exec sh -c '… "$1" …' foo {} \;.
trobinson about 5 years

Thanks for the information! Yeah, the -type -f is a typo I make constantly when using find, and I didn't notice it in my answer. Whoops. terdon's answer is better, but I'll leave this for comparative purposes.
Cbhihe about 5 years

@terdon: tx for referencing the mywiki.wooledge.org page. It's nice to have a bunch of GPs neatly summarized in one place.
Cbhihe about 5 years

To improve site readability & ease of use, e.g. by new users, do consider at least adding a note at the top of yr post (by editing it) to signal the fact that yr for ... done loop-based answer is not preferred and left for reason "XYZ".... If not unwitting visitors might credit that particular post with upvotes (as seen already). I've seen many SE visitors not read the fine prints (i.e. the comments following an answer). They just stop at the nbr of upvotes. Upvotes are often seen as a seal of approval by some, who tend to accept whatever has apparently been positively sanctioned.
Boson Bear about 2 years

I find the first one working for me but I'm confused why there's an extra sh in the very end as in ... sh {} \;. Would you mind clarifying? @terdon
terdon about 2 years

@BosonBear because that will become $0 in the sh script, you can use any arbitrary string. See unix.stackexchange.com/a/389706/22222