Best way to choose a random file from a directory in a shell script

46,919

Solution 1

files=(/my/dir/*)
printf "%s\n" "${files[RANDOM % ${#files[@]}]}"

And don't parse ls. Read http://mywiki.wooledge.org/ParsingLs

Edit: Good luck finding a non-bash solution that's reliable. Most will break for certain types of filenames, such as filenames with spaces or newlines or dashes (it's pretty much impossible in pure sh). To do it right without bash, you'd need to fully migrate to awk/perl/python/... without piping that output for further processing or such.

Solution 2

Is "shuf" not portable?

shuf -n1 -e /path/to/files/*

or find if files are deeper than one directory:

find /path/to/files/ -type f | shuf -n1

it's part of coreutils but you'll need 6.4 or newer to get it... so RH/CentOS does not include it.

Solution 3

# ******************************************************************
# ******************************************************************
function randomFile {
  tmpFile=$(mktemp)

  files=$(find . -type f > $tmpFile)
  total=$(cat "$tmpFile"|wc -l)
  randomNumber=$(($RANDOM%$total))

  i=0
  while read line;  do
    if [ "$i" -eq "$randomNumber" ];then
      # Do stuff with file
      amarok $line
      break
    fi
    i=$[$i+1]
  done < $tmpFile
  rm $tmpFile
}

Solution 4

Something like:

let x="$RANDOM % ${#file}"
echo "The randomly-selected file is ${path[$x]}"

$RANDOM in bash is a special variable that returns a random number, then I use modulus division to get a valid index, then reference that index in the array.

Solution 5

Here's a shell snippet that relies only on POSIX features and copes with arbitrary file names (but omits dot files from the selection). The random selection uses awk, because that's all you get in POSIX. It's a very poor random number generator, since awk's RNG is seeded with the current time in seconds (so it's easily predictable, and returns the same choice if you call it multiple times per second).

set -- *
n=$(echo $# | awk '{srand(); print int(rand()*$0) + 1}')
eval "file=\$$n"
echo "Processing $file"

If you don't want to ignore dot files, the file name generation code (set -- *) needs to be replaced by something more complicated.

set -- *; [ -e "$1" ] || shift
set .[!.]* "$@"; [ -e "$1" ] || shift
set ..?* "$@"; [ -e "$1" ] || shift
if [ $# -eq 0]; then echo 1>&2 "empty directory"; exit 1; fi

If you have OpenSSL available, you can use it to generate random bytes. If you don't but your system has /dev/urandom, replace the call to openssl by dd if=/dev/urandom bs=3 count=1 2>/dev/null. Here's a snippet that sets n to a random value between 1 and $#, taking care not to introduce a bias. This snippet assumes that $# is at most 2^23-1.

while
  n=$(($(openssl rand 3 | od -An -t u4) + 1))
  [ $n -gt $((16777216 / $# * $#)) ]
do :; done
n=$((n % $#))
Share:
46,919
JasonSmith
Author by

JasonSmith

Apache CouchDB committer and Node.js developer.

Updated on July 09, 2022

Comments

  • JasonSmith
    JasonSmith almost 2 years

    What is the best way to choose a random file from a directory in a shell script?

    Here is my solution in Bash but I would be very interested for a more portable (non-GNU) version for use on Unix proper.

    dir='some/directory'
    file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
    path=`readlink --canonicalize "$dir/$file"` # Converts to full path
    echo "The randomly-selected file is: $path"
    

    Anybody have any other ideas?

    Edit: lhunath makes a good point about parsing ls. I guess it comes down to whether you want to be portable or not. If you have the GNU findutils and coreutils then you can do:

    find "$dir" -maxdepth 1 -mindepth 1 -type f -print0 \
      | sort --zero-terminated --random-sort \
      | sed 's/\d000.*//g/'
    

    Whew, that was fun! Also it matches my question better since I said "random file". Honsetly though, these days it's hard to imagine a Unix system deployed out there having GNU installed but not Perl 5.

  • JasonSmith
    JasonSmith about 15 years
    That's a good point. Is ls -1 standard on Unix, or is that just GNU? Anyway yes the biggest problem is getting a random number. I would argue that Perl is pretty universal since it's been shipping standard since IIRC Solaris 2.6 and HP-UX 11i
  • Chas. Owens
    Chas. Owens about 15 years
    -1 as an argument to ls is standard in SUS2 (opengroup.org/onlinepubs/007908799/xcu/ls.html). I don't know when it was added, but I believe it was available back in the POSIX days as well.
  • ashawley
    ashawley about 15 years
    RANDOM and arrays are Bash features, and the OP is "interested [in] a more portable (non-GNU) version for use on Unix proper".
  • ashawley
    ashawley about 15 years
    Poster want's a solution with no Bash-isms.
  • JasonSmith
    JasonSmith about 15 years
    Thanks @lhunath, The point about ls is well-taken. I updated the question.
  • JasonSmith
    JasonSmith about 15 years
    That's a great idea. You have to scan the directory twice and there is a race condition if the number of files changes in between scans, but in practice that's probably not a big deal.
  • JasonSmith
    JasonSmith about 15 years
    @MGoDave don't feel too bad. I am always interested in a good Bash solution and a good GNU-free solution, for different situations and as a mental exercise.
  • JasonSmith
    JasonSmith about 15 years
    @Chas thanks for the link. Still, Aaron has a point that filenames with newlines could cause problems. So that could be relevent depending on whether and how you let "civilians" create files directly on the filesystem.
  • ashawley
    ashawley about 15 years
    Yeah, I'm convinced that traditional Bourne shell programming is fundamentally flawed for many situations regardless of one's best efforts. Enter Bash and GNU coreutils to save the day.
  • sente
    sente over 13 years
    your example doesn't actually work, printf "%s\n" "${files[RANDOM % ${#files}]}" should be printf "%s\n" "${files[RANDOM % ${#files[@]}]}" -- ${#files} represents the length (strlen) of the first value in the files array. ${#files[@]} represents the number of elements in the files array, which is what we want.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 13 years
    Awk does get you a random number, and it's the only way offered by POSIX, but it's a very bad RNG (predictable, and the output only changes once per second). Also, don't parse the output of ls.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 13 years
    It's not much more difficult to handle arbitrary file names in portable sh than in bash. The only thing in bash that makes it easier is arrays, and that's only useful when you need to manipulate multiple lists of file names at the same time.
  • BBog
    BBog over 11 years
    It would be nice to post a little explanation along with the code.
  • harperville
    harperville almost 11 years
    And #file is what exactly?
  • Allison
    Allison about 10 years
    Really useful for people who need something to just work. Doesn't matter who, doesn't matter how hacky it is.
  • Matt Fletcher
    Matt Fletcher over 9 years
    You can use gshuf (brew install gshuf) on Mac. Works with Mavericks for sure but not tested on any other versions!
  • Frizlab
    Frizlab over 9 years
    shuf is now in the coreutils formula, and is prefixed with g (type gshuf after installing the coreutils formula)
  • JW.
    JW. over 9 years
    brew install gshuf didn't work for me, but brew install coreutils did.
  • Emile Mercier
    Emile Mercier over 8 years
    The first set of curly braces { line[NR]=$0 } , creates an associative array arbitrarily named 'line' that stores each line of output from ls, indexed by NR, which is a special awk variable that indicates the number of the record. After all lines of output have been stored in the array, awk moves on to the END section. NR at this point equals the total number of lines of output from ls. So, we pick a random number from NR and retrieve the line at that index. To better answer OP's question ls could be replaced by 'find . -maxdepth 1 -type f'
  • Peter Cordes
    Peter Cordes over 7 years
    Note that printf isn't part of the solution, unless you want the filename on stdout instead of as an arg to an arbitrary command.
  • Rapti
    Rapti over 6 years
    If I understood your link correctly, parsing ls is only a problem if there are files containing newlines in them. In a lot of cases people can know for certain that the files in the directory they are parsing won't contain any of such files. Parsing ls is especially easy if you want to do stuff right on the command line instead of writing a script.
  • lhunath
    lhunath over 6 years
    @Rapti no, parsing ls is always harder than not parsing ls. $(ls) is much harder than * and it causes bugs. There is no excuse for parsing ls. ls is a tool for humans, not for code. Every time you parse ls you're adding a potential for bugs at the same time as making life harder on yourself. All you need to do is learn what globs are, and you will never need ls again.
  • hoijui
    hoijui over 5 years
    @harperville ${#file} is the number of elements in the bash array file
  • Chance Shaffer
    Chance Shaffer almost 3 years
    In an environment restricted to just Unix Shell this worked perfectly. Thank you!
  • Emobe
    Emobe over 2 years
    Single quotes aren't needed around the dir. Apart from that this works a treat!