How to pass files found by find as arguments?

9,640

Solution 1

Use arrays.

If you don't need to handle the possibility of newlines in your filenames, then you could get away with

mapfile -t ABC_FILES < <(find -L some/dir -name \*.abc | sort)
mapfile -t XYZ_FILES < <(find -L other/dir -name \*.xyz | sort)

then

./program --abc-files "${ABC_FILES[@]}" --xyz-files "${XYZ_FILES[@]}"

If you do need to handle newlines within filenames, and have bash >= 4.4, you can use -print0 and -d '' to null-terminate the names during array construction:

mapfile -td '' ABC_FILES < <(find -L some/dir -name \*.abc -print0 | sort -z)

(and similarly for the XYZ_FILES). If you don't have the newer bash, then you could use a null-terminated read loop to append filenames to the arrays e.g.

ABC_FILES=()
while IFS= read -rd '' f; do ABC_FILES+=( "$f" ); done < <(find -L some/dir -name \*.abc -print0 | sort -z)

Solution 2

You can use IFS=newline (assuming no filenames contain newline) but you must set it in the outer shell BEFORE the substitution:

$ ls -1
a file with spaces
able
alpha
baker
boo hoo hoo
bravo
$ # note semicolon here; it's not enough to be in the environment passed
$ # to printf, it must be in the environment OF THE SHELL WHILE PARSING
$ IFS=$'\n'; printf '%s\n' --afiles $(find . -name 'a*') --bfiles $(find . -name 'b*')
--afiles
./able
./a file with spaces
./alpha
--bfiles
./bravo
./boo hoo hoo
./baker

With zsh but not bash you can use null $'\0' as well. Even in bash you could handle newline if there's one sufficiently strange character that is never used like

 IFS=$'\1'; ... $(find ... -print0 | tr '\0' '\1') ...

However, this approach does not handle the additional request you made in comments on @steeldriver's answer to omit the --afiles if find a is empty.

Solution 3

I'm not sure I understand why you gave up on xargs.

So doing an xargs from one search still leaves me with how to deal with the other one…

The string --xyz-files is just one of many arguments and there's no reason to consider it special before it is interpreted by your program. I think you can pass it through xargs among both find results:

{ find -L some/dir -name \*.abc -print0 | sort -z; echo -ne "--xyz-files\0"; find -L other/dir -name \*.xyz -print0 | sort -z; } | xargs -0 ./program --abc-files
Share:
9,640

Related videos on Youtube

Adam Badura
Author by

Adam Badura

Updated on September 18, 2022

Comments

  • Adam Badura
    Adam Badura almost 2 years

    First to cut off trivial but inapplicable answers: I can use neither the find+xargs trick nor its variants (like find with -exec) because I need to use few such expressions per call. I will get back to this at the end.


    Now for a better example let's consider:

    $ find -L some/dir -name \*.abc | sort
    some/dir/1.abc
    some/dir/2.abc
    some/dir/a space.abc
    

    How do I pass those as arguments to program?

    Just doing it doesn't do the trick

    $ ./program $(find -L some/dir -name \*.abc | sort)
    

    fails since program gets following arguments:

    [0]: ./program
    [1]: some/dir/1.abc
    [2]: some/dir/2.abc
    [3]: some/dir/a
    [4]: space.abc
    

    As can be seen, the path with space was split and program considers it to be two different arguments.

    Quote until it works

    It seems novice users such as myself, when faced with such problems, tend to randomly add quotes until it finally works - only here it doesn't seem to help…

    "$(…)"

    $ ./program "$(find -L some/dir -name \*.abc | sort)"
    [0]: ./program
    [1]: some/dir/1.abc
    some/dir/2.abc
    some/dir/a space.abc
    

    Because the quotes prevent word-splitting, all the files are passed as a single argument.

    Quoting individual paths

    A promising approach:

    $ ./program $(find -L some/dir -name \*.abc -printf '"%p"\n' | sort)
    [1]: "some/dir/1.abc"
    [2]: "some/dir/2.abc"
    [3]: "some/dir/a
    [4]: space.abc"
    

    The quotes are there, sure. But they are no longer interpreted. They are just part of the strings. So not only they did not prevent word splitting, but also they got into arguments!

    Change IFS

    Then I tried playing around with IFS. I would prefer find with -print0 and sort with -z anyway - so that they will have no issues on "wired paths" themselves. So why not force word splitting on the null character and have it all?

    $ ./program $(IFS=$'\0' find -L some/dir -name \*.abc -print0 | sort -z)
    [0]: ./program
    [1]: some/dir/1.abcsome/dir/2.abcsome/dir/a
    [2]: space.abc
    

    So it still splits on space and does not split on the null.

    I tried to place the IFS assignment both in $(…) (as shown above) and before ./program. Also I tried other syntax like \0, \x0, \x00 both quoted with ' and " as well as with and without the $. None of those seemed to make any difference…


    And here I'm out of ideas. I tried few more things but all seemed to run down to the same problems as listed.

    What else could I do? Is it doable at all?

    Sure, I could make the program accept the patterns and do searches itself. But it is a lot of double work while fixing it to a specific syntax. (What about providing files by a grep for example?).

    Also I could make the program accept a file with a list of paths. Then I can easily dump find expression to some temp file and provide the path to that file only. This could supported be along direct paths so that if user has just a simple path it can be provided without intermediate file. But this doesn't seem nice - one needs to create extra files and take care of them, not to mention extra implementation required. (On the plus side, however, it could be a rescue for cases in which the number of files as arguments start to cause issues with command line length…)


    At the end, let me remind you again that find+xargs (and alike) tricks will not work in my case. For description simplicity I'm showing only one argument. But my true case looks more like this:

    $ ABC_FILES=$(find -L some/dir -name \*.abc | sort)
    $ XYZ_FILES=$(find -L other/dir -name \*.xyz | sort)
    $ ./program --abc-files $ABC_FILES --xyz-files $XYZ_FILES
    

    So doing an xargs from one search still leaves me with how to deal with the other one…

  • Adam Badura
    Adam Badura over 7 years
    Excellent! I was thinking about arrays. But somehow I didn't found anything on that mapfile (or its synonym readarray). But it does work!
  • Adam Badura
    Adam Badura over 7 years
    Yet you could improve it a bit. The Bash < 4.4 version (which I happen to have...) with a while loop doesn't clear the array. Which means that if no files are found the array is undefined. While if it is already defined new files will be appended (instead of replacing the old ones). It seems that adding declare -a ABC_FILES='()'; before while does the trick. (While just adding ABC_FILES='()'; does not.)
  • Adam Badura
    Adam Badura over 7 years
    Also what does the < < mean here? Is it the same as <<? I don't think so as changing it to << yields syntax error ("unexpected token `('"). So what it is and how does it work?
  • Adam Badura
    Adam Badura over 7 years
    Another improvement (along my particular usage) is to construct yet another array. So we have those ABC_FILES. That is fine. But it is useful to also make ABS_ARGS which is an empty array if ABC_FILES is empty or else it is an array ('--abc-files' "${ABC_FILES[@]}"). This way later on I can use it like this: ./program "${ABC_ARGS[@]}" "${XYZ_ARGS[@]}" and be sure that it will work correctly regardless of which (if any) of the groups is empty. Or to state it differently: this way --abc-files (and --xyz-files) will be provided only if it is followed by some actual path.
  • dave_thompson_085
    dave_thompson_085 over 7 years
    @AdamBadura: while read ... done < <(find blah) is normal shell redirection < from a special file created by PROCESS SUBSTITUTION. This differs from piping find blah | while read ... done because the pipeline runs the while loop in a subshell so the var(s) set in it aren't retained for subsequent commands.
  • Adam Badura
    Adam Badura over 7 years
    There is another interesting edge case. With the approach shown above am I able to check if the <(find ...) expression failed or not? In some scenarios it will by my own program and if it exists with non-0 status I would rather interrupt the whole procedure with an error message. While it seems to me that with this approach I will end up with an empty array (as my program doesn't generate any output in such cases).
  • Adam Badura
    Adam Badura over 7 years
    So as I understand in Bash there is no way to force IFS to split on null?
  • dave_thompson_085
    dave_thompson_085 over 7 years
    @AdamBadura: I'm pretty sure not; bash doesn't allow null byte in any variable, including IFS. Note the read -d '' used in steeldriver's methods is an empty string not one containing a null byte. (And a command option isn't a var as such anyway.)
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    You must also disable globbing (set -o noglob) before using that split+glob operator (except in zsh).
  • done
    done over 7 years
  • done
    done over 7 years
    @AdamBadura Yes, In bash, a null is exactly the same as $'\0' and also as ''.
  • Adam Badura
    Adam Badura over 7 years
    You are right! This works as well! However notice that you missed -print0 in second find. Also if going this way I would put the --abc-files as an echo as well - just for consistency.
  • Adam Badura
    Adam Badura over 7 years
    This approach seems simpler and somewhat more one-liner than the array approach. However it would require some extra logic to cover case that if there are no .abc files then there should also be no --abc-files (same with .xyz). The array-based solution by steeldriver also requires extra logic for it but that logic is trivial there while it might be not-so-trivial here destroying the main advantage of this solution - simplicity.
  • Adam Badura
    Adam Badura over 7 years
    Also I'm not really sure but I assume that xargs will never try to split arguments and make few commands instead of one, unless it is explicitly instructed to do so with -L, --max-lines (-l), --max-args (-n) or --max-chars (-s) arguments. Am I right? Or are there some defaults? As my program would not handle such split correctly and I would rather have a failure to call it...
  • Kamil Maciorowski
    Kamil Maciorowski over 7 years
    @AdamBadura Missing -print0 -- fixed, thanks. I don't know all the answers but I agree my solution makes it hard to include extra logic. I would probably go with arrays myself, now when I know this approach. My answer was not really for you. You had already accepted the other answer and I assumed your problem is solved. I just wanted to point out you can pass arguments from multiple sources through xargs, which wasn't obvious at first glance. You may treat it as a proof of concept. Now we all know few different approaches and we can consciously choose what fits us in every particular case.
  • Adam Badura
    Adam Badura over 7 years
    Yes, I already implemented the array-based solution and it works like charm. I'm especially proud of how cleanly it deals with the optionality (if no files then no --abc-files). But you are right - it is good to know your alternatives! Especially that I mistakenly thought it is not possible.