Undo tar file extraction mess
Solution 1
tar tf archive.tar
will list the contents line by line.
This can be piped to xargs
directly, but beware: do the deletion very carefully. You don't want to just rm -r
everything that tar tf
tells you, since it might include directories that were not empty before unpacking!
You could do
tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v
to first remove all files that were in the archive, and then the directories that are left empty.
sort -r
(glennjackman suggested tac
instead of sort -r
in the comments to the accepted answer, which also works since tar
's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1
contains a single empty directory dir2
will leave dir1
after the rmdir
pass, since it was not empty before dir2
was removed.
This will generate a lot of
rm: cannot remove `dir/': Is a directory
and
rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory
Shut this up with 2>/dev/null
if it annoys you, but I'd prefer to keep as much information on the process as possible.
And don't do it until you are sure that you match the right files. And perhaps try rm -i
to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.
Solution 2
List the contents of the tar file like so:
tar tzf myarchive.tar
Then, delete those file names by iterating over that list:
while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)
This will still just list the files that would be deleted. Replace echo
with rm
if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.
In a second pass, remove the directories that are left over:
while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)
This prevents directories with from being deleted if they already existed before.
Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo
when done.
tar tvf myarchive.tar | tac | xargs -d'\n' echo rm
This could then be followed by the normal rmdir
cleanup.
Solution 3
Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
my $clean_folder = "clean";
my $DRY_RUN;
die "Usage: $0 [--dry] [--clean=dir-name]\n"
if ( !GetOptions("dry!" => \$DRY_RUN,
"clean=s" => \$clean_folder));
# Protect the 'clean_folder' string from shell substitution
$clean_folder =~ s/'/'\\''/g;
# Process the "tar tv" listing and output a shell script.
print "#!/bin/sh\n" if ( !$DRY_RUN );
while (<>)
{
chomp;
# Strip out permissions string and the directory entry from the 'tar' list
my $perms = substr($_, 0, 10);
my $dirent = substr($_, 48);
# Drop entries that are in subdirectories
next if ( $dirent =~ m:/.: );
# If we're in "dry run" mode, just list the permissions and the directory
# entries.
#
if ( $DRY_RUN )
{
print "$perms|$dirent\n";
next;
}
# Emit the shell code to clean up the folder
$dirent =~ s/'/'\\''/g;
print "mv -i '$dirent' '$clean_folder'/.\n";
}
Save this to the file fix-tar.pl
and then execute it like this:
$ tar tvf myarchive.tar | perl fix-tar.pl --dry
This will confirm that your tar
list is like mine. You should get output like:
-rw-rw-r--|batch
-rw-rw-r--|book-report.png
-rwx------|CaseReports.png
-rw-rw-r--|caseTree.png
-rw-rw-r--|tree.png
drwxrwxr-x|sample/
If that looks good, then run it again like this:
$ mkdir cleanup
$ tar tvf myarchive.tar | perl fix-tar.pl --clean=cleanup > fixup.sh
The fixup.sh
script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup
). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:
$ sh fixup.sh
I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv
.
Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr
function calls until they look proper. The $perms
variable is used only for the dry run so really only the $dirent
substring needs to be proper.
One other thing: you may need to use the tar
option --numeric-owner
if the user names and/or group names in the tar
listing make the names start in an unpredictable column.
Mike T
Updated on September 18, 2022Comments
-
Mike T over 1 year
I just untar'd an archive that produced a mess of files into my tidy directory. For example:
user@comp:~/tidy$ tar xvf myarchive.tar file1 file2 dir1/ dir1/file1 dir1/subdir1/ dir1/subdir1/file1 dir2/ dir2/file1 ...
I was expecting that the tar file would have been organized in a single folder (i.e.,
myarchive/
), but it wasn't! Now I have some 190 files and directories that have digitally barfed in what was an organized directory. These untar'd files need to be cleaned up.Is there any way to "undo" this and delete the files and directories that were extracted from this archive?
Thanks for the excellent answers below. In summary, here is what works with two steps (1) delete files, and (2) delete empty directory structure in reverse packing order (to delete outer directories first):
tar tf myarchive.tar | xargs -d'\n' rm tar tf myarchive.tar | tac | xargs -d'\n' rmdir
And safer yet, to preview a dry-run of the commands by appending
echo
afterxargs
. -
slhck about 12 yearsIt's not a pipe. It's process substitution and I prefer this over simple piping when used in combination with
while
to loop over a set of records. Just got used to it. @sté -
Daniel Andersson about 12 years@slhck and Stéphane: Ah, yes, I'll update. I just did a small test case, but the files had no spaces.
-
slhck about 12 yearsShould be noted that BSD
xargs
has no-d
, so you need the GNU variant if you're a poor soul like me. -
Mike T about 12 yearsAh yes, I needed to
chmod
those directories in order to remove them. This archive file was pretty messed up .. also it had an extension .tar.zip .. thanks! -
Stéphane Gimenez about 12 yearsSorry for the little delay, I noticed that using
rm -rf
could delete files that were not from the archive but inside a directory that has the same name as one from the archive. Better be careful here and usermdir
in a second pass. -
Mike T about 12 yearsActually the second pass with
rmdir
needs to be run for each level of nesting of directories. So it will clean outsubdir1
on the first pass, but leavedir1
since it tried to delete this first when it wasn't empty at the time. This command could be done once if the file list can be reverse sorted. -
slhck about 12 yearsTricky problem indeed, @Mike. I don't know if there's an easy solution to do this other than trying to sort the paths by their nesting level, then remove the files, and then start with removing the innermost directories.
-
glenn jackman about 12 yearsIf you want to delete the in the reverse order:
tar tvf arch.tar | tac | xargs echo rm
(remove the echo when you're confident) -
slhck about 12 years@glennjackman That will only reverse the order of the listing, but the problem is reversing the order by depth. So basically, you'd need to delete the files in a backwards breadth-first order.
-
glenn jackman about 12 years@slhck, based on the example in the question, same thing.
-
slhck about 12 years@glennjackman Ah, correct, didn't see the updated example. Added your line to the answer so it's easier to see. Seems like a great trick.
-
Mike T about 12 yearsthe order from
tar tf
preserves depth, sotac
is working correctly