Glob with Numerical Order

11,004

Solution 1

Depending on your environment you can use ls -v with GNU coreutils, e.g.:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls -v)

Or if you are on recent versions of FreeBSD or OpenBSD:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls | sort -V)

Solution 2

Once more, zsh's glob qualifiers come to the rescue.

echo *.pdf(n)

Solution 3

If all the files in question have the same prefix (i.e., the text before the number; c in this case), you can use

gs  …args…  c?.pdf c??.pdf

c?.pdf expands to c0.pdf c1.pdfc9.pdfc??.pdf expands to c10.pdf c11.pdfc20.pdf (and up to c99.pdf, as applicable).  While each command-line word containing pathname expansion character(s) is expanded to a list of filenames sorted (collated) in accordance with the LC_COLLATE variable, the lists resulting from the expansion of adjacent wildcards (globs) are not merged; they are simply concatenated.  (I seem to recall that the shell man page once stated this explicitly, but I can’t find it now.)

Of course if the files can go up to c999.pdf, you should use c?.pdf c??.pdf c???.pdf.  Admittedly, this can get tedious if you have a lot of digits.  You can abbreviate it a little; for example, for (up to) five digits, you can use c?{,?{,?{,?{,?}}}}.pdf.  If your list of filenames is sparse (e.g., there’s a c0.pdf and a c12345.pdf, but not necessarily every number in between), you should probably set the nullglob option.  Otherwise, if (for example) you have no files with two-digit numbers, you would get a literal c??.pdf argument passed to your program.

If you have multiple prefixes (e.g., a<number>.pdf, b<number>.pdf , and c<number>.pdf , with numbers of one or two digits), you can use the obvious, brute force approach:

a?.pdf a??.pdf b?.pdf b??.pdf c?.pdf c??.pdf

or collapse it to {a,b,c}?{,?}.pdf.

Solution 4

If there are no gaps, the following could prove helpful (albeit sketchy and not robust concerning edge-cases and generality) -- just to get an idea:

FILES="c0.pdf"
for i in $(seq 1 20); do FILES="${FILES} c${i}.pdf"; done
gs [...args...] $FILES

If there may be gaps, some [ -f c${i}.pdf ] check could be added.

Edit also see this answer, according to which you could (using Bash) use

gs [..args..] c{1..20}.pdf

Solution 5

Just quoting and fixing Thor's answer... NEVER parse ls!

You can use sort -V (a non-POSIX extension to sort):

printf '%s\0' ./* | sort -zV \
    | xargs -0 gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH \
        -sDEVICE=pdfwrite -sOutputFile=out.pdf

(for some commands, apparently for gs is such a command, you need "./" instead of ""... if one doesn't work, try the other)

Share:
11,004

Related videos on Youtube

moooeeeep
Author by

moooeeeep

md5: 26c4a8cb14faad4309ab2dd5321dee8b

Updated on September 18, 2022

Comments

  • moooeeeep
    moooeeeep almost 2 years

    I have this list of pdf files in a directory:

    c0.pdf   c12.pdf  c15.pdf  c18.pdf  c20.pdf  c4.pdf  c7.pdf
    c10.pdf  c13.pdf  c16.pdf  c19.pdf  c2.pdf   c5.pdf  c8.pdf
    c11.pdf  c14.pdf  c17.pdf  c1.pdf   c3.pdf   c6.pdf  c9.pdf
    

    I want to concatenate these using ghostscript in numerical order (similar to this):

    gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf *.pdf
    

    But the shell expansion order does not reproduce the natural order of the numbers but the alphabetical order:

    $ for f in *.pdf; do echo $f; done
    c0.pdf
    c10.pdf
    c11.pdf
    c12.pdf
    c13.pdf
    c14.pdf
    c15.pdf
    c16.pdf
    c17.pdf
    c18.pdf
    c19.pdf
    c1.pdf
    c20.pdf
    c2.pdf
    c3.pdf
    c4.pdf
    c5.pdf
    c6.pdf
    c7.pdf
    c8.pdf
    c9.pdf
    

    How can I achieve the desired order in the expansion (if possible without manually adding 0-padding to the numbers in the file names)?

    I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case.

  • Sundeep
    Sundeep over 7 years
    ls -v will natural sort of (version) numbers within text so that can be used as well...
  • Thor
    Thor over 7 years
    @Sundeep: Indeed, but this seems to be a GNU coreutils only solution.
  • Sundeep
    Sundeep over 7 years
    yeah, seems like GNU specific - pubs.opengroup.org/onlinepubs/9699919799
  • Thor
    Thor over 7 years
    @Sundeep: The -V feature of sort is not specified by POSIX either. However, it seems to have spread farther, for example both FreeBSD and OpenBSD sort support it.
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 7 years
    It is generally a good idea to quote your shell variable references (e.g., "$FILES" and "$i") unless you have a good reason not to, and you’re sure you know what you’re doing.  (By contrast, while braces can be important, they’re not as important as quotes,  so, for example, "c$i.pdf" is good enough.)  A command like gs  [ …args… ]  $FILES, where $FILES contains a space-separated list of files, may seem like a good reason to use $FILES without quoting it (because "$FILES" won’t work in that context).  … (Cont’d)
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 7 years
    (Cont’d) …  But see Security implications of forgetting to quote a variable in bash/POSIX shells, in particular, my answer to it, for notes on how to handle multi-word variables as arrays in bash (e.g., FILES=("c0.pdf") and FILES+=("c$i.pdf")); also this answer, which uses the technique I suggest.
  • vipin
    vipin almost 7 years
    NEVER parse ls! Use stat -c "%n" * instead.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    The don't parse ls output is because ls displays the file names newline-separated while newline is as valid as any in a file name, but here you're doing the same thing with stat but adding several other issues (like problems with filenames starting with -, problem if there are too many files, stat being a non-portable command). And because you used the split+glob operator without adjusting IFS or disabling globs, you'll still have issues with filenames with space or tab or wildcard characters.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    To use GNU sort -V reliably, you'd need ${(z)"$(printf '%s\0' * | sort -zV)"} in zsh (though zsh has (n) for numerical sort already) or readarray -td '' files < <(printf '%s\0' * | sort -zV) in bash4.4+.
  • vipin
    vipin almost 7 years
    @StéphaneChazelas thanks, and you are right that newline can be a concern, but that isn't the only reason not to parse ls. And yeah I was lazy and didn't add -- either. But I should have used printf...I'll change that.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    for ls alone (that is without -l), what are those other concerns? Note that -- wouldn't help for a file called -.
  • vipin
    vipin almost 7 years
    @StéphaneChazelas there are other differences between versions... like some print "total 0" on there, and the newest ls versions even stick quotes around things where you don't want them... touch \"test\"; ls -1 for example shows '"test"' on my ls. It's simply not meant to be parsed... it's a user interface, not a scripting command.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    the total x is only for ls -l/n.... The quoting is only for output to a terminal (not a pipe like here). For a POSIX compliant ls, the only problem would be the newlines. But -v is not a POSIX option anyway. Now, I've just realised that busybox ls now also supports ls -v and busybox ls is one of those implementations that are not POSIX compliant as it does some mangling even when stdout is not a terminal.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    * -> ./* to avoid problems with some file names with gs.
  • Jeff Schaller
    Jeff Schaller almost 7 years
    Note also the OP's comment "I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case."
  • Thor
    Thor almost 7 years
    @Peter: In general I agree, but there are exceptions
  • vipin
    vipin almost 7 years
    and also I would change my comment above since stat with %n is not really best either due to whitespace being allowed in filenames...use printf '%s\0', and things like xargs -0 or while read... I wrote an answer that has that.
  • Kyle
    Kyle almost 5 years
    This is the best answer because it's beyond any claims of sketchy use of ls, stat, or anything else; and also works in bash as requested.