Break a large file into smaller pieces

98,225

Solution 1

You can use split and cat.

For example something like

$ split --bytes 500M --numeric-suffixes --suffix-length=3 foo foo.

(where the input filename is foo and the last argument is the output prefix). This will create files like foo.000 foo.001 ...

The same command with short options:

$ split -b 100k -d -a 3 foo foo

You can also specify "--line-bytes" if you wish it to split on line boundaries instead of just exact number of bytes.

For re-assembling the generated pieces again you can use e.g.:

$ cat foo.* > foo_2

(assuming that the shell sorts the results of shell globbing - and the number of parts does not exceed the system dependent limit of arguments)

You can compare the result via:

$ cmp foo foo_2
$ echo $?

(which should output 0)

Alternatively, you can use a combination of find/sort/xargs to re-assemble the pieces:

$ find -maxdepth 1 -type f -name 'foo.*'  | sort | xargs cat > foo_3

Solution 2

You can also do this with Archive Manager if you prefer a GUI. Look under 'Save->Other Options->Split into volumes of'.

Share:
98,225
Stefan
Author by

Stefan

I like code, beer, rock climbing and travel.

Updated on September 17, 2022

Comments

  • Stefan
    Stefan over 1 year

    How do I break a large, +4GB file into smaller files of about 500MB each.

    And how do I re-assemble them again to get the original file?

  • mm2001
    mm2001 over 13 years
    Try this command: man split cat md5sum
  • Stefan
    Stefan over 13 years
    i tagged it 'command-line', but thanks for the answer :)
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 13 years
    When assembling, I recommend cat foo.{000..NNN} where NNN is the last expected piece. That way you get an error message if one of the pieces is missing. But note that -d to get numeric suffixes is specific to GNU split; on other platforms you have to make do with foo.aaa, foo.aab, etc.
  • Zorawar
    Zorawar over 11 years
    And bear in mind that, for split, KB = 1000, K = 1024, MB = 1000*1000, M = 1024*1024 etc.
  • Manu Kanthan
    Manu Kanthan almost 9 years
    Shouldn't this ... cat > foo_3 be ... cat >>foo_3?
  • maxschlepzig
    maxschlepzig almost 9 years
    @alk, no, it should not. The part that xargs sees as arguments (and thus potentially forks/execs multiple times) is cat. The part > foo_3 is interpreted by the shell (the shell creates the output redirection for the xargs process). Thus, everything is ok.
  • Manu Kanthan
    Manu Kanthan almost 9 years
    Ah yes, sure. Temp brain laps ... sry.
  • infixed
    infixed almost 8 years
    If you decide to ease pain by using a utility. rar and 7zip are often used in making such splits easier to reassemble cross-platform