Split files based on file content and pattern matching

12,400

Solution 1

This might work for you:

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

This will produce files temp00.txt, temp01.txt...

If you only want the Rate line then;

sed -i '/Rate/!d' temp*.txt

Solution 2

I'd do this in perl:

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">$1") or die "oops";
        next;
    }

    print $out $_
}

Use it like

perl ./test.pl input.txt

Solution 3

(g)awk to the rescue:

awk '/^Rate:/ {output_file_name=$2; getline } 
     { print $0 >> ( output_file_name ) }' INPUT_FILE

The first rule and command executes for the lines that starts with Rate: and only sets the output file name, then gets the next line from the input file. Then this next line is processed and gets written to the output file. After that the next line is processed by only the second command (gets written to the output file), but only if it not matches Rate:.

NOTE: The above solution might fail if there is a section in the input file with two continuous lines of Rate:s, like this:

... DATA ...
Rate: GBP
Rate: CHF
... DATA ...

should do (assuming that the line numbers are not part of the original file).

HTH

Solution 4

A one-liner inspired by sehe's answer:

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_$1.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

The -p option will read a line and print it after the code in -e is evaluated. select will choose a default filehandle for print. So, basically, what we are doing is simply juggling the filehandle around, depending on which Rate is currently the active one.

Here's the code deparsed:

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_$1.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

Solution 5

Another solution: It just makes your input file into a script and then runs it:

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

I assumed the line numbers are not part of the file.

Share:
12,400

Related videos on Youtube

Dean
Author by

Dean

Updated on June 04, 2022

Comments

  • Dean
    Dean almost 2 years

    I need your help with formate a txt file using bash/linux. The file looks like the following, it always has a line called Rate: Sth then it follows with the details in the very specific format. I'd like to split the file up with one rate for each file. In this example, I'd like to have 3 file, and each has the corresponding line says what the Rate value was.

    How will you approach this?

    line No. Main Text
    1    Rate: GBP
    2    12/01/1999,90.5911501,Validated
         .....
         .....
    210  18/01/1999,90.954996,Validated
    211  Rate: RMB
    212  24/04/2008,132.2542,Validated
         .....
    1000 25/04/2008,132.2279,Validated
    1001 28/04/2008,131.69915,Validated
    1002 Rate: USD
    1003 21/11/11,-0.004419534,Validated
    
  • JRFerguson
    JRFerguson over 12 years
    Clever first open to allow succinct loop. Very nice.
  • jaypal singh
    jaypal singh over 12 years
    Wont this only get one line after your matched pattern?
  • TLP
    TLP over 12 years
    +1 for inspiring answer. See my answer for the one-liner version of your idea.
  • jaypal singh
    jaypal singh over 12 years
    Thanks Zsolt for the explanation. Don't know why, but I am still having issues running the one-liner. Shouldn't the print $0 >> output_file_name have " around the output_file_name
  • potong
    potong over 12 years
    I like this solution! Especially the way you use the rate's text to name the file. A small quibble but may save a few hairs - here documents will interpolate variables etc by default s/^Rate:/cat <<\\EOF >/ will turn it off.
  • potong
    potong over 12 years
    A small tweak and you can have the Rate... line too. /^Rate:/{h;s//.../;G};