Splitting a large txt file every 100 lines and including the original header (on a Mac)
Solution 1
- Remove the header and put it into a separate file
header.txt
. -
split the data using
split --lines=100 data.txt
(this generate lots of files with 100 lines in them each named xaa xab xac and so on) - Then prepend the header to each file
for a in x??; do cat header.txt $a > $a.txt; done
This results in your finished data files (with headers) being calledxaa.txt xab.txt xac.txt ...
If the amount of data is so large (or you split on fewer lines) that xxx files is not enough split makes four letter named files. In that case insert an extra ?
in the for
-statement above.
Edit:
To automate the extraction of the header use head -4 origdata.txt > header.txt
to extract the first four lines. Use tail -n +4 origdata.txt > data.txt
to extract everything except the first four lines. Now you have two files one with the header and one with the data. It should not be too hard to combine this to a script. (I have no access to bash today)
Solution 2
Based on the answer provided by Nifle I made a script that executes his suggested commands, adds the original filename to the output and cleans up the temporary files.
#!/bin/bash
FILE=$(ls -1 | grep filename.txt)
NAME=${FILE%%.txt}
head -4 $FILE > header.txt
tail -n +5 $FILE > data.txt
split -l 100 data.txt
for a in x??
do
cat header.txt $a > $NAME.$a.txt
done
mv $FILE $NAME.orig.txt
rm header.txt data.txt x??
Et voila!
Related videos on Youtube
Dan
Updated on September 17, 2022Comments
-
Dan almost 2 years
I am looking for a tool or script (Textwrangler or Terminal) that can split a larger text file every 100 lines counting from line 5 (the first 4 are header lines) and output individual .txt files which include the original header.
For instance
input:
File.txt line1 / line4 HEADER ... line5 / line265 DATA
output:
File_01.txt line1/line4 HEADER line5/line104 DATA File_02.txt line1/line4 HEADER line5/line104 DATA File_03.txt line1/line4 HEADER line5/line65 DATA
The text file uses Windows line breaks (CR LF) in case that matters.
I am currently doing this manually so any suggestions that can make this process more efficient are very welcome.
-
Dan almost 14 yearsthanks! I had to substitute "--lines=100" for "-l 100" but apart from that it works like a charm. However ideally I would prefer a script or a single line command to do the job so it is easier for (less computer savvy) coworkers to take over these tasks in my absence.
-
Julian almost 14 years@Dan - You could put it all in a script with a bit of fiddling. See my edit to automate the first part.
-
Dan almost 14 yearsI managed to compile your suggestions into a script. I also defined a few variables to incorporate the original file name in the output. It probably contains a few scripting faux pas since I am fairly new to this but it does the trick quite nicely. Thanks again!