Best way to choose a random file from a directory in a shell script
Solution 1
files=(/my/dir/*)
printf "%s\n" "${files[RANDOM % ${#files[@]}]}"
And don't parse ls. Read http://mywiki.wooledge.org/ParsingLs
Edit: Good luck finding a non-bash
solution that's reliable. Most will break for certain types of filenames, such as filenames with spaces or newlines or dashes (it's pretty much impossible in pure sh
). To do it right without bash
, you'd need to fully migrate to awk
/perl
/python
/... without piping that output for further processing or such.
Solution 2
Is "shuf" not portable?
shuf -n1 -e /path/to/files/*
or find if files are deeper than one directory:
find /path/to/files/ -type f | shuf -n1
it's part of coreutils but you'll need 6.4 or newer to get it... so RH/CentOS does not include it.
Solution 3
# ******************************************************************
# ******************************************************************
function randomFile {
tmpFile=$(mktemp)
files=$(find . -type f > $tmpFile)
total=$(cat "$tmpFile"|wc -l)
randomNumber=$(($RANDOM%$total))
i=0
while read line; do
if [ "$i" -eq "$randomNumber" ];then
# Do stuff with file
amarok $line
break
fi
i=$[$i+1]
done < $tmpFile
rm $tmpFile
}
Solution 4
Something like:
let x="$RANDOM % ${#file}"
echo "The randomly-selected file is ${path[$x]}"
$RANDOM
in bash is a special variable that returns a random number, then I use modulus division to get a valid index, then reference that index in the array.
Solution 5
Here's a shell snippet that relies only on POSIX features and copes with arbitrary file names (but omits dot files from the selection). The random selection uses awk, because that's all you get in POSIX. It's a very poor random number generator, since awk's RNG is seeded with the current time in seconds (so it's easily predictable, and returns the same choice if you call it multiple times per second).
set -- *
n=$(echo $# | awk '{srand(); print int(rand()*$0) + 1}')
eval "file=\$$n"
echo "Processing $file"
If you don't want to ignore dot files, the file name generation code (set -- *
) needs to be replaced by something more complicated.
set -- *; [ -e "$1" ] || shift
set .[!.]* "$@"; [ -e "$1" ] || shift
set ..?* "$@"; [ -e "$1" ] || shift
if [ $# -eq 0]; then echo 1>&2 "empty directory"; exit 1; fi
If you have OpenSSL available, you can use it to generate random bytes. If you don't but your system has /dev/urandom
, replace the call to openssl
by dd if=/dev/urandom bs=3 count=1 2>/dev/null
. Here's a snippet that sets n
to a random value between 1 and $#
, taking care not to introduce a bias. This snippet assumes that $#
is at most 2^23-1.
while
n=$(($(openssl rand 3 | od -An -t u4) + 1))
[ $n -gt $((16777216 / $# * $#)) ]
do :; done
n=$((n % $#))
Comments
-
JasonSmith almost 2 years
What is the best way to choose a random file from a directory in a shell script?
Here is my solution in Bash but I would be very interested for a more portable (non-GNU) version for use on Unix proper.
dir='some/directory' file=`/bin/ls -1 "$dir" | sort --random-sort | head -1` path=`readlink --canonicalize "$dir/$file"` # Converts to full path echo "The randomly-selected file is: $path"
Anybody have any other ideas?
Edit: lhunath makes a good point about parsing
ls
. I guess it comes down to whether you want to be portable or not. If you have the GNU findutils and coreutils then you can do:find "$dir" -maxdepth 1 -mindepth 1 -type f -print0 \ | sort --zero-terminated --random-sort \ | sed 's/\d000.*//g/'
Whew, that was fun! Also it matches my question better since I said "random file". Honsetly though, these days it's hard to imagine a Unix system deployed out there having GNU installed but not Perl 5.
-
JasonSmith about 15 yearsThat's a good point. Is
ls -1
standard on Unix, or is that just GNU? Anyway yes the biggest problem is getting a random number. I would argue that Perl is pretty universal since it's been shipping standard since IIRC Solaris 2.6 and HP-UX 11i -
Chas. Owens about 15 years-1 as an argument to ls is standard in SUS2 (opengroup.org/onlinepubs/007908799/xcu/ls.html). I don't know when it was added, but I believe it was available back in the POSIX days as well.
-
ashawley about 15 yearsRANDOM and arrays are Bash features, and the OP is "interested [in] a more portable (non-GNU) version for use on Unix proper".
-
ashawley about 15 yearsPoster want's a solution with no Bash-isms.
-
JasonSmith about 15 yearsThanks @lhunath, The point about ls is well-taken. I updated the question.
-
JasonSmith about 15 yearsThat's a great idea. You have to scan the directory twice and there is a race condition if the number of files changes in between scans, but in practice that's probably not a big deal.
-
JasonSmith about 15 years@MGoDave don't feel too bad. I am always interested in a good Bash solution and a good GNU-free solution, for different situations and as a mental exercise.
-
JasonSmith about 15 years@Chas thanks for the link. Still, Aaron has a point that filenames with newlines could cause problems. So that could be relevent depending on whether and how you let "civilians" create files directly on the filesystem.
-
ashawley about 15 yearsYeah, I'm convinced that traditional Bourne shell programming is fundamentally flawed for many situations regardless of one's best efforts. Enter Bash and GNU coreutils to save the day.
-
sente over 13 yearsyour example doesn't actually work,
printf "%s\n" "${files[RANDOM % ${#files}]}"
should beprintf "%s\n" "${files[RANDOM % ${#files[@]}]}"
--${#files}
represents the length (strlen) of the first value in thefiles
array.${#files[@]}
represents the number of elements in thefiles
array, which is what we want. -
Gilles 'SO- stop being evil' almost 13 yearsAwk does get you a random number, and it's the only way offered by POSIX, but it's a very bad RNG (predictable, and the output only changes once per second). Also, don't parse the output of
ls
. -
Gilles 'SO- stop being evil' almost 13 yearsIt's not much more difficult to handle arbitrary file names in portable sh than in bash. The only thing in bash that makes it easier is arrays, and that's only useful when you need to manipulate multiple lists of file names at the same time.
-
BBog over 11 yearsIt would be nice to post a little explanation along with the code.
-
harperville almost 11 yearsAnd #file is what exactly?
-
Allison about 10 yearsReally useful for people who need something to just work. Doesn't matter who, doesn't matter how hacky it is.
-
Matt Fletcher over 9 yearsYou can use gshuf (
brew install gshuf
) on Mac. Works with Mavericks for sure but not tested on any other versions! -
Frizlab over 9 years
shuf
is now in thecoreutils
formula, and is prefixed with g (typegshuf
after installing thecoreutils
formula) -
JW. over 9 years
brew install gshuf
didn't work for me, butbrew install coreutils
did. -
Emile Mercier over 8 yearsThe first set of curly braces { line[NR]=$0 } , creates an associative array arbitrarily named 'line' that stores each line of output from ls, indexed by NR, which is a special awk variable that indicates the number of the record. After all lines of output have been stored in the array, awk moves on to the END section. NR at this point equals the total number of lines of output from ls. So, we pick a random number from NR and retrieve the line at that index. To better answer OP's question ls could be replaced by 'find . -maxdepth 1 -type f'
-
Peter Cordes over 7 yearsNote that
printf
isn't part of the solution, unless you want the filename on stdout instead of as an arg to an arbitrary command. -
Rapti over 6 yearsIf I understood your link correctly, parsing
ls
is only a problem if there are files containing newlines in them. In a lot of cases people can know for certain that the files in the directory they are parsing won't contain any of such files. Parsingls
is especially easy if you want to do stuff right on the command line instead of writing a script. -
lhunath over 6 years@Rapti no, parsing
ls
is always harder than not parsingls
.$(ls)
is much harder than*
and it causes bugs. There is no excuse for parsingls
.ls
is a tool for humans, not for code. Every time you parsels
you're adding a potential for bugs at the same time as making life harder on yourself. All you need to do is learn what globs are, and you will never needls
again. -
hoijui over 5 years@harperville
${#file}
is the number of elements in the bash arrayfile
-
Chance Shaffer almost 3 yearsIn an environment restricted to just Unix Shell this worked perfectly. Thank you!
-
Emobe over 2 yearsSingle quotes aren't needed around the dir. Apart from that this works a treat!