Unrecognized archive format when trying to extract content from a TAR archive

8,131

Solution 1

You can try to save what is still possible with the utility cpio.
(GNU cpio copies files into or out of a cpio or tar archive...)

The command below should do the job extracting the files and recreating the directory tree:

      cpio -ivd -H tar < myfile.tar 

Ps> Work in a safe place (directory).

In details:

  • -i or --extract Run in copy-in mode.
  • -d or --make-directories Create leading directories where needed.
    (by default it doesn't create directory)
  • -v or --verbose as often write more informations, this time it lists the files processed.
  • -H format or --format=format Use archive format format.

Resume:
It's almost a complete tar file. It has not the magic number in the header, it has the finals signature. Moreover you can read the part with text files (when you use less) so without compression. It's corrupted. Why (if because the server doesn't finish the job or it was transferred corrupting some parts) is secondary. If you can generate it again it's better. If not you can try what above or below...

References:
You can find the help on the site of cpio or on the mac site too.
Full option list


Alternatives:

  • Following this post and downloading this perl script, after that you decompress it (with bunzip2 find_tar_headers.pl.bz2) you can use it:

    perl find_tar_headers.pl yourfile.tar
    

    it will answer with a series of lines as

    yourfile.tar:12345:dir/subdir/yourfile:126344
    yourfile.tar:20578:dir/subdir/yourfile:123453 ...

    read the 1st number, in this case 12345 and what follows should solve your problem

    tail -c +12345 yourfile.tar > extracted_tail_yourfile.tar
    tar xf extracted_tail_yourfile.tar
    
  • 75 73 74 61 72 00 30 30 or 75 73 74 61 72 20 20 00 are the signatures for a tar file. (according with this list). You can write one of them at the offset 257 on a copy of your file with an hexadecimal editor or with dd and it will be recognized as a tar file. Probably you will be able to extract something too, but it will be still corrupted: it's better if you can generate it again.

Solution 2

Just because a file has a tar or tgz suffix does not make it a tar file.

You can check the magic number in the file header with the file command:

file myfile

And it will report the type of file. It could be zip format, bzip format or even cpio.

If it returns "data" then it may not be a known archive or not an archive at all.

Share:
8,131

Related videos on Youtube

Jens
Author by

Jens

Updated on September 18, 2022

Comments

  • Jens
    Jens over 1 year

    I have an uncompressed TAR file created on Linux.

    If I use less to view the file I can see all the data; mostly code files. However if I try to decompress it I get the following error:

    tar: Unrecognized archive format error.

    If I do a file thetarfile.tar I get the type data.

    How can I extract the file?

    • Paul
      Paul over 9 years
      What command are you using to decompress it? Note tar is not a compression format, it is an archive format - the data is stored the same size as the origin.
    • Jens
      Jens over 9 years
      I am trying with this tar -xvf tarfile.tar
    • Paul
      Paul over 9 years
      Does tar tf tarfile.tar list the files expected in the tar?
    • Jens
      Jens over 9 years
      No it does not. Same error I am afraid
    • Paul
      Paul over 9 years
      When you say you can see the data, do you mean you can see php script files in plain text? My guess would be that the backup failed from Drupal and so it isn't recognised as a proper tar.
    • Jens
      Jens over 9 years
      Yes, that is what I can see. If I use tail I can see the end of the tar file is correct too. I think that it combined all the files and then failed, then sent me the entire combined file. But I am wondering is there is a way to reverse it, so the files are split into prober files again ?
    • Paul
      Paul over 9 years
      A tar file includes details of each file - including filenames and lengths - before the data. If those headers are intact, then tar should work. Some probably are, so it is a case of examining the file and seeing what parts can be recovered. Or doing the backup again.
  • Jens
    Jens over 9 years
    Yeah, it does return data as I wrote in my question. I can see all the content of the files if I view the file in less.
  • AngelaS
    AngelaS over 9 years
    tar and cpio are just standardized formats for combining files into a single archive. A programmer is free to create their own format, and this may be what you are seeing. A custom format archive. Unless you can id which program created the archive, you may be out of luck. Ask for another archive in a standard format.
  • Jens
    Jens over 9 years
    The program is the Backup and Migrate module in Drupal. I have 2 other tar files of two other site on the same server. These are working normally.