Using bash variable substitution instead of cut/awk

6,437

Solution 1

You can remove the shortest leading substring that matches */

tmp="${filename#*/}"

and then remove the longest trailing substring that matches /*

echo "${tmp%%/*}"

Solution 2

    echo $f
    a/b/c

    $ (IFS='/';set $f; echo $1)
     a

    $ (IFS='/';set $f; echo $2)
     b

    $ (IFS='/';set $f; echo $3)
     c

with wild card it seems to work with double or single quotes -

    f="a?b?c"
     $(IFS="?"; set $f; echo $1)
     a
    echo $f
    a*b*c
    (IFS="*"; set $f; echo $1)
    a

yes, you'll have to unset the IFS back to default

    unset IFS

Solution 3

Feed the list to awk to speed it up:

awk -F '/' '{print $2}' < <(find /usr)
awk -F '/' '{print $2}' < inputfile

Demonstration:

time awk -F '/' '{print $2; SUM++} END {print "number of directories found: " SUM}' < <(find /usr -type d)
usr
usr
.
.
number of directories found: 16748

real    0m8.910s
user    0m0.050s
sys     0m0.050s
Share:
6,437

Related videos on Youtube

bonh
Author by

bonh

that's just how i feel

Updated on September 18, 2022

Comments

  • bonh
    bonh over 1 year

    Can I use bash variable substitution to extract a piece of a variable based on a delimeter? I'm trying to get the immediate directory name of a filename (in this case, foo).

    $ filename=./foo/bar/baz.xml
    

    I know I could do something like

    echo $filename | cut -d '/' -f 2
    

    or

    echo $filename | awk -F '/' '{print $2}'
    

    but it's getting slow to fork awk/cut for multiple filenames.

    I did a little profiling of the various solutions, using my real files:

    echo | cut:

    real    2m56.805s
    user    0m37.009s
    sys     1m26.067s
    

    echo | awk:

    real    2m56.282s
    user    0m38.157s
    sys     1m31.016s
    

    @steeldriver's variable substitution/shell parameter expansion:

    real    0m0.660s
    user    0m0.421s
    sys     0m0.235s
    

    @jai_s's IFS-wrangling:

    real    1m26.243s
    user    0m13.751s
    sys     0m28.969s
    

    Both suggestions were a huge improvement over my existing ideas, but the variable substitution is fastest because it doesn't require forking any new processes.

    • Jeff Schaller
      Jeff Schaller over 8 years
    • 123
      123 over 8 years
      Send all the filenames to one invocation of awk and it will be significantly faster than any solution in pure bash
    • 123
      123 over 8 years
      Can you not use an array, do them all at once then put it in a new array ?
  • bonh
    bonh over 8 years
    Ooh, I like that.
  • James Sneeringer
    James Sneeringer over 8 years
    This is usually my preferred method as well, but bear in mind that Bash only supports $1 through $9 using this syntax. For 10th and later arguments, the ${10} form must be used.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 8 years
    Doesn't work when $f contains wildcards. And you need to restore IFS afterwards (or do this in a command substitution, to get the value of a field, and that strips off trailing newlines).
  • bonh
    bonh over 8 years
    The example works in isolation (inside Git bash on Windows), but when I pipe from the find command I get this error: echo: write error: Bad address.
  • bonh
    bonh over 8 years
    Okay, looks like I have to unset IFS every time.
  • bonh
    bonh over 6 years
    In this case I was looking for the immediate directory, not the full directory path.
  • Sandburg
    Sandburg about 5 years
    unset IFS vs SAVIFS=$IFS I do prefere the second... or it may unset IFS for the calling context?