Splitting a large txt file every 100 lines and including the original header (on a Mac)

11,051

Solution 1

  1. Remove the header and put it into a separate file header.txt.
  2. split the data using split --lines=100 data.txt (this generate lots of files with 100 lines in them each named xaa xab xac and so on)
  3. Then prepend the header to each file for a in x??; do cat header.txt $a > $a.txt; done This results in your finished data files (with headers) being called xaa.txt xab.txt xac.txt ...

If the amount of data is so large (or you split on fewer lines) that xxx files is not enough split makes four letter named files. In that case insert an extra ? in the for-statement above.

Edit:
To automate the extraction of the header use head -4 origdata.txt > header.txt to extract the first four lines. Use tail -n +4 origdata.txt > data.txt to extract everything except the first four lines. Now you have two files one with the header and one with the data. It should not be too hard to combine this to a script. (I have no access to bash today)

Solution 2

Based on the answer provided by Nifle I made a script that executes his suggested commands, adds the original filename to the output and cleans up the temporary files.

#!/bin/bash

FILE=$(ls -1 | grep filename.txt)
NAME=${FILE%%.txt}

head -4 $FILE > header.txt
tail -n +5 $FILE > data.txt

split -l 100 data.txt

for a in x??
    do
        cat header.txt $a > $NAME.$a.txt
    done

mv $FILE $NAME.orig.txt
rm header.txt data.txt x??

Et voila!

Share:
11,051

Related videos on Youtube

Dan
Author by

Dan

Updated on September 17, 2022

Comments

  • Dan
    Dan almost 2 years

    I am looking for a tool or script (Textwrangler or Terminal) that can split a larger text file every 100 lines counting from line 5 (the first 4 are header lines) and output individual .txt files which include the original header.

    For instance

    input:

    File.txt
    line1 / line4   HEADER
    ...
    line5 / line265 DATA
    

    output:

    File_01.txt
    line1/line4   HEADER
    line5/line104 DATA
    
    File_02.txt
    line1/line4   HEADER
    line5/line104 DATA
    
    File_03.txt
    line1/line4   HEADER
    line5/line65  DATA
    

    The text file uses Windows line breaks (CR LF) in case that matters.

    I am currently doing this manually so any suggestions that can make this process more efficient are very welcome.

  • Dan
    Dan almost 14 years
    thanks! I had to substitute "--lines=100" for "-l 100" but apart from that it works like a charm. However ideally I would prefer a script or a single line command to do the job so it is easier for (less computer savvy) coworkers to take over these tasks in my absence.
  • Julian
    Julian almost 14 years
    @Dan - You could put it all in a script with a bit of fiddling. See my edit to automate the first part.
  • Dan
    Dan almost 14 years
    I managed to compile your suggestions into a script. I also defined a few variables to incorporate the original file name in the output. It probably contains a few scripting faux pas since I am fairly new to this but it does the trick quite nicely. Thanks again!