Randomly copy certain amount of certain file type from one directory into another

26,038

Solution 1

You could use shuf:

shuf -zn8 -e *.jpg | xargs -0 cp -vt target/
  • shuf shuffles the list of *.jpg files in the current directory.
  • -z is to zero-terminate each line, so that files with special characters are treated correctly.
  • -n8 exits shuf after 8 files.
  • xargs -0 reads the input delimited by a null character (from shuf -z) and runs cp.
  • -v is to print every copy verbosely.
  • -t is to specify the target directory.

Solution 2

The best answer absolutely didn't worked for me, because -e *.jpg doesn't actually look into the working directory. It's just an expression. So shuf doesn't shuffle anything...

I found the following improvement based on what I learned in that post.

find /some/dir/ -type f -name "*.jpg" -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/

Solution 3

You can also do this with Python.

Here is a python scscript I use to move a random percent of images that also gets associated label datasets typically required for CV image datasets. Note this moves the files because I do not want my test training dataset in my training dataset.

I use the below for Yolo training sets as labels and images are in the same directory and the labels are txt files.

import numpy as np
import os
import random

#set directories
directory = str('/MauiData/maui_complete_sf_train')
target_directory = str('/MauiData/maui_complete_sf_test')
data_set_percent_size = float(0.07)

#print(os.listdir(directory))

# list all files in dir that are an image
files = [f for f in os.listdir(directory) if f.endswith('.jpg')]

#print(files)

# select a percent of the files randomly 
random_files = random.sample(files, int(len(files)*data_set_percent_size))
#random_files = np.random.choice(files, int(len(files)*data_set_percent_size))

#print(random_files)

# move the randomly selected images by renaming directory 

for random_file_name in random_files:      
    #print(directory+'/'+random_file_name)
    #print(target_directory+'/'+random_file_name)
    os.rename(directory+'/'+random_file_name, target_directory+'/'+random_file_name)
    continue

# move the relevant labels for the randomly selected images

for image_labels in random_files:
    # strip extension and add .txt to find corellating label file then rename directory. 
    os.rename(directory+'/'+(os.path.splitext(image_labels)[0]+'.txt'), target_directory+'/'+(os.path.splitext(image_labels)[0]+'.txt'))

    continue

Solution 4

You could retrieve files in this way:

files=(/tmp/*.jpg)
n=${#files[@]}
file_to_retrieve="${files[RANDOM % n]}"
cp $file_to_retrieve <destination>

make a loop 8 times.

Share:
26,038

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin almost 2 years

    Sometimes I have a folder full of jpg's and I need to randomly choose 8 or so of them. How could I automate this so my account randomly chooses 8 jpg's from the folder and copies them to another destination?

    My question is simple really, instead of using cp and giving it a file name then destination file name, I want to build a script that randomly chooses 8 of the .jpgs in the folder, and copies those to another folder.

  • roaima
    roaima over 6 years
    The -e *.jpg expects a set of matching files in the current directory. If there are no matches it will (usually) return the single literal *.jpg to shuf, which then has only one element to consider.
  • gented
    gented over 5 years
    So essentially rather than an answer you provide a list of variable names.
  • havakok
    havakok over 4 years
    What if some of the file names start with -? I tried shuf -zn8 -e *.jpg | xargs -0 cp -vt -- {} target/ to no avail.
  • Asad Aizaz
    Asad Aizaz over 4 years
    Thank you for this solution; it works with a large number of files, as opposed to the accepted solution.
  • Jake Ireland
    Jake Ireland over 3 years
    For anyone who has found this answer and has seen @havakok's question, they also asked the question here and obtained an answer: unix.stackexchange.com/a/544902/372726
  • Manu CJ
    Manu CJ over 3 years
    This solution does not work with a large number of files. Halavus answer solves the problem in that case.
  • Phlogi
    Phlogi over 3 years
    If you are on MacOS, first install coreutils for the shuf command (brew install coreutils), then use: find /some/dir/ -type f -name "*.jpg" -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 -J % cp -v % /your/target/dir
  • Admin
    Admin about 2 years
    For me it is the best answer because for a huge directory with thousand of images, the @chaos answer fails