Undo tar file extraction mess

16,142

Solution 1

tar tf archive.tar

will list the contents line by line.

This can be piped to xargs directly, but beware: do the deletion very carefully. You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking!

You could do

tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v

to first remove all files that were in the archive, and then the directories that are left empty.

sort -r (glennjackman suggested tac instead of sort -r in the comments to the accepted answer, which also works since tar's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1 contains a single empty directory dir2 will leave dir1 after the rmdir pass, since it was not empty before dir2 was removed.

This will generate a lot of

rm: cannot remove `dir/': Is a directory

and

rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory

Shut this up with 2>/dev/null if it annoys you, but I'd prefer to keep as much information on the process as possible.

And don't do it until you are sure that you match the right files. And perhaps try rm -i to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.

Solution 2

List the contents of the tar file like so:

tar tzf myarchive.tar

Then, delete those file names by iterating over that list:

while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)

This will still just list the files that would be deleted. Replace echo with rm if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.

In a second pass, remove the directories that are left over:

while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)

This prevents directories with from being deleted if they already existed before.


Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done.

tar tvf myarchive.tar | tac | xargs -d'\n' echo rm

This could then be followed by the normal rmdir cleanup.

Solution 3

Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.

    #!/usr/bin/perl -w

    use strict;
    use Getopt::Long;

    my $clean_folder = "clean";
    my $DRY_RUN;
    die "Usage: $0 [--dry] [--clean=dir-name]\n"
        if ( !GetOptions("dry!" => \$DRY_RUN,
                         "clean=s" => \$clean_folder));

    # Protect the 'clean_folder' string from shell substitution
    $clean_folder =~ s/'/'\\''/g;

    # Process the "tar tv" listing and output a shell script.
    print "#!/bin/sh\n" if ( !$DRY_RUN );
    while (<>)
    {
        chomp;

        # Strip out permissions string and the directory entry from the 'tar' list
        my $perms = substr($_, 0, 10);
        my $dirent = substr($_, 48);

        # Drop entries that are in subdirectories
        next if ( $dirent =~ m:/.: );

        # If we're in "dry run" mode, just list the permissions and the directory
        # entries.
        #
        if ( $DRY_RUN )
        {
            print "$perms|$dirent\n";
            next;
        }

        # Emit the shell code to clean up the folder
        $dirent =~ s/'/'\\''/g;
        print "mv -i '$dirent' '$clean_folder'/.\n";
    }

Save this to the file fix-tar.pl and then execute it like this:

$ tar tvf myarchive.tar | perl fix-tar.pl --dry

This will confirm that your tar list is like mine. You should get output like:

-rw-rw-r--|batch
-rw-rw-r--|book-report.png
-rwx------|CaseReports.png
-rw-rw-r--|caseTree.png
-rw-rw-r--|tree.png
drwxrwxr-x|sample/

If that looks good, then run it again like this:

$ mkdir cleanup
$ tar tvf myarchive.tar | perl fix-tar.pl --clean=cleanup > fixup.sh

The fixup.sh script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:

$ sh fixup.sh

I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv.

Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr function calls until they look proper. The $perms variable is used only for the dry run so really only the $dirent substring needs to be proper.

One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column.

Share:
16,142
Mike T
Author by

Mike T

Updated on September 18, 2022

Comments

  • Mike T
    Mike T over 1 year

    I just untar'd an archive that produced a mess of files into my tidy directory. For example:

    user@comp:~/tidy$ tar xvf myarchive.tar
    file1
    file2
    dir1/
    dir1/file1
    dir1/subdir1/
    dir1/subdir1/file1
    dir2/
    dir2/file1
    ...
    

    I was expecting that the tar file would have been organized in a single folder (i.e., myarchive/), but it wasn't! Now I have some 190 files and directories that have digitally barfed in what was an organized directory. These untar'd files need to be cleaned up.

    Is there any way to "undo" this and delete the files and directories that were extracted from this archive?


    Thanks for the excellent answers below. In summary, here is what works with two steps (1) delete files, and (2) delete empty directory structure in reverse packing order (to delete outer directories first):

    tar tf myarchive.tar | xargs -d'\n' rm
    tar tf myarchive.tar | tac | xargs -d'\n' rmdir
    

    And safer yet, to preview a dry-run of the commands by appending echo after xargs.

  • slhck
    slhck about 12 years
    It's not a pipe. It's process substitution and I prefer this over simple piping when used in combination with while to loop over a set of records. Just got used to it. @sté
  • Daniel Andersson
    Daniel Andersson about 12 years
    @slhck and Stéphane: Ah, yes, I'll update. I just did a small test case, but the files had no spaces.
  • slhck
    slhck about 12 years
    Should be noted that BSD xargs has no -d, so you need the GNU variant if you're a poor soul like me.
  • Mike T
    Mike T about 12 years
    Ah yes, I needed to chmod those directories in order to remove them. This archive file was pretty messed up .. also it had an extension .tar.zip .. thanks!
  • Stéphane Gimenez
    Stéphane Gimenez about 12 years
    Sorry for the little delay, I noticed that using rm -rf could delete files that were not from the archive but inside a directory that has the same name as one from the archive. Better be careful here and use rmdir in a second pass.
  • Mike T
    Mike T about 12 years
    Actually the second pass with rmdir needs to be run for each level of nesting of directories. So it will clean out subdir1 on the first pass, but leave dir1 since it tried to delete this first when it wasn't empty at the time. This command could be done once if the file list can be reverse sorted.
  • slhck
    slhck about 12 years
    Tricky problem indeed, @Mike. I don't know if there's an easy solution to do this other than trying to sort the paths by their nesting level, then remove the files, and then start with removing the innermost directories.
  • glenn jackman
    glenn jackman about 12 years
    If you want to delete the in the reverse order: tar tvf arch.tar | tac | xargs echo rm (remove the echo when you're confident)
  • slhck
    slhck about 12 years
    @glennjackman That will only reverse the order of the listing, but the problem is reversing the order by depth. So basically, you'd need to delete the files in a backwards breadth-first order.
  • glenn jackman
    glenn jackman about 12 years
    @slhck, based on the example in the question, same thing.
  • slhck
    slhck about 12 years
    @glennjackman Ah, correct, didn't see the updated example. Added your line to the answer so it's easier to see. Seems like a great trick.
  • Mike T
    Mike T about 12 years
    the order from tar tf preserves depth, so tac is working correctly