remove file but exclude all files in a list

19,162

Solution 1

The rm command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.

The check directory section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.

You can remove the echo deleting line to run silently.

#!/bin/bash

cd /home/me/myfolder2tocleanup/

# Exit if the directory isn't found.
if (($?>0)); then
    echo "Can't find work dir... exiting"
    exit
fi

for i in *; do
    if ! grep -qxFe "$i" filelist.txt; then
        echo "Deleting: $i"
        # the next line is commented out.  Test it.  Then uncomment to removed the files
        # rm "$i"
    fi
done

Solution 2

This python script can do this:

#!/usr/bin/env python3
import os
no_remove = set()
with open('./dont-delete.txt') as f:
     for line in f:
         no_remove.add(line.strip())

for f in os.listdir('.'):
    if f not in no_remove:
        print('unlink:' + f ) 
        #os.unlink(f)

Important part is to uncomment the os.unlink() function.

NOTE: add this script and dont-delete.txt to your dont-delete.txt so that they both are on the list, and keep them in the same directory.

Solution 3

Here's a one-liner:

comm -2 -3 <(ls) <(sort dont_delete) | tail +2 | xargs -p rm
  1. ls prints all files in the current directory (in sorted order)
  2. sort dont_delete prints all the files we don't want to delete in sorted order
  3. the <() operator turns a string into a file-like object
  4. The comm commands compares two pre-sorted files and prints out lines on which they differ
  5. using the -2 -3 flags causes comm to only print lines contained in the first file but not the second, which will be the list of files that are safe to delete
  6. the tail +2 call is just to remove the heading of the comm output, which contains the name of the input file
  7. Now we get a list of files to delete on standard out. We pipe this output to xargs which will turn the output stream into a list of arguments for rm. The -p option forces xargs to ask for confirmation before executing.

Solution 4

Unless the output of ls /home/me/myfolder2tocleanup/ exceeds the maximum shell argument limit ARG_MAX which is around 2MB for Ubuntu, I would suggest the following.


A one line command implementation that will do the job, would be as follows:

  1. Copy the dont-delete.txt file to the directory containing the files to be deleted like so:
cp dont-delete.txt /home/me/myfolder2tocleanup/
  1. cd to the directory containing the files to be deleted like so:
cd /home/me/myfolder2tocleanup/
  1. Do a dry-run to test the command and make it print the names of the files that it detects as to be deleted without actually deleting them, like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs echo | tr " " "\n"
  1. If you are satisfied with the output, delete the files by running the command like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs rm

Explaination:

  • ls -p will list all the files and directories in the current directory and the option -p will add a / to the directory names.
  • grep -v / will exclude directories by removing all items containing a / in their names.
  • sed 's/\<dont-delete.txt\>//g'will exclude the dont-delete.txt file, so it does not get deleted in the process.
  • sort will, just to make sure, sort the remaining output of ls.
  • comm -3 - <(sort dont-delete.txt) will sort the dont-delete.txt file, compare it to the sorted output of ls and exclude filenames that exist in both.
  • xargs rm will remove all the remaining filenames in the already processed output of ls. This means all the items in the current directory will be removed except for directories, files listed in the dont-delete.txt file and the dont-delete.txt file itself

In the dry-run part:

  • xargs echo will print the files that should be removed.
  • tr " " "\n" will translate spaces into new lines for easier readability.

Notice:

In some cases parsing the output of ls might be better avoided.

Solution 5

FWIW it looks like you can do this natively in zsh, using the (+cmd) glob qualifier.

To illustrate, let's start with some files

 % ls
bar  baz  bazfoo  keepfiles.txt  foo  kazoo

and a whitelist file

 % cat keepfiles.txt
foo
kazoo
bar

First, read the whitelist into an array:

 % keepfiles=( "${(f)$(< keepfiles.txt)}" )

or perhaps better

 % zmodload zsh/mapfile
 % keepfiles=( ${(f)mapfile[./keepfiles.txt]} )

(the equivalent of bash's mapfile builtin - or its synonym readarray). Now we can check whether a key (filename) exists in the array using ${keepfiles[(I)filename]} which returns 0 if no match is found:

 % print ${keepfiles[(I)foo]}
1
 % print ${keepfiles[(I)baz]}
0
 %

We can use this to make a function that returns true if there are no matches for $REPLY in the array:

% nokeep() { (( ${keepfiles[(I)$REPLY]} == 0 )); }

Finally, we use this function as a qualifier in our command:

 % ls *(+nokeep)
baz  bazfoo  keepfiles.txt

or, in your case

 % rm -- *(+nokeep)

(You'll likely want to add the name of the whitelist file itself to the whitelist.)

Share:
19,162

Related videos on Youtube

stefan83
Author by

stefan83

Updated on September 18, 2022

Comments

  • stefan83
    stefan83 over 1 year

    I need to cleanup a folder periodically. I get a filelist which contains text, which files are allowed. Now I have to delete all files which are not in this file.

    Example:

    dont-delete.txt:

    dontdeletethisfile.txt
    reallyimportantfile.txt
    neverdeletethis.txt
    important.txt
    

    My folder do clean-up contains this as example:

    ls /home/me/myfolder2tocleanup/:

    dontdeletethisfile.txt
    reallyimportantfile.txt
    neverdeletethis.txt
    important.txt
    this-can-be-deleted.txt
    also-waste.txt
    never-used-it.txt
    

    So this files should be deleted:

    this-can-be-deleted.txt
    also-waste.txt
    never-used-it.txt
    

    I search something to create a delete command with an option to exclude some files provided by file.

    • mook765
      mook765 over 7 years
      Is this a homework?
    • Gujarat Santana
      Gujarat Santana over 7 years
      I hope you're not his teacher. lol
    • Sergiy Kolodyazhnyy
      Sergiy Kolodyazhnyy over 7 years
      @gujarat We're not free homework service, so the comment is justified. As for the question itself, it may be useful to others, so it's open so far.
    • Gujarat Santana
      Gujarat Santana over 7 years
      @Serg I'm totally agree with you
  • David Foerster
    David Foerster over 7 years
    I changed your code to use a set instead of a list for O(1) instead of O(n) look-up in the second part.
  • stefan83
    stefan83 over 7 years
    thanks for your help, i'm normally a windows guy, but python seams too be cool =)
  • stefan83
    stefan83 over 7 years
    thx for your help, now I have my solution !
  • David Foerster
    David Foerster over 7 years
    @stefan83: Python runs just as well on Windows.
  • David Foerster
    David Foerster over 7 years
    I edited your code to avoid useless use of ls and the useless capturing of the output of grep if all you want to know is whether there was a match or not. I also used fixed-string patterns to avoid escaping issues.
  • Apologician
    Apologician over 7 years
    @DavidFoerster Thanks for the contribution. However, when you changed the while loop to a for loop you inadvertently changed the iteration key from i to f. in the declaration, which broke the code. I fixed it.
  • David Foerster
    David Foerster over 7 years
    Oops, force of habit. I tend to abbreviate shell variable names for file names as f. ;-P (…and +1 for your answer which I forgot earlier.)
  • Jacques MALAPRADE
    Jacques MALAPRADE almost 6 years
    I tried this with a text file of the file names separated by a newline. It ended up deleting all the files in the directory.
  • nyxz
    nyxz almost 6 years
    I guess your "keep list" was wrong.
  • nyxz
    nyxz almost 6 years
    I've added example usage.
  • Negar
    Negar almost 5 years
    @gardenhead, I tired your code but it removes all files in the directory and keep only the first and the last file in the dont-delete list. do you have any idea for this problem? thanks in advance.
  • Tex
    Tex over 2 years
    This is better than the accepted answer, as if the keep list is of length M and you have N files to filter, this solution is O(MlgM + N)
  • PesKchan
    PesKchan over 2 years
    i tried thi, instead of files i have folders and sub-folder with files inside. It didn't work why is it so?