'merging' 2 files into a third using perl


Solution 1

You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.

Try something like this instead:

open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;

while (my $f1 = readline ($file1)) {
  print $file3 $f1;  #line from file1

  if (my $f2 = readline ($file2)) {  #if there are any lines left in file2
    print $file3 $f2;

while (my $f2 = readline ($file2)) {   #if there are any lines left in file2
  print $file3 $f2;

close $file1;
close $file2;
close $file3;

Solution 2

Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:

use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;

chomp( my @file1 = read_file shift );
chomp( my @file2 = read_file shift );

write_file shift, join "\n", grep defined $_, zip @file1, @file2;

Solution 3

You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.

  • Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
  • Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
  • **Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
  • Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
  • Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.

And, here's your program:

#! /usr/bin/env perl

use strict;
use warnings;
use autodie;

open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;

for (;;) {
    my $line_1 = <$file_1>;
    my $line_2 = <$file_2>;
    last if not defined $line_1 and not defined $line_2;
    no warnings qw(uninitialized);
    print {$output_fh} $line_1 . $line_2;
    use warnings;

In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.

Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.

Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.

Here's another way to do that for loop:

while ( 1 ) {
    my $line_1 = <$file_1>;
    my $line_2 = <$file_2>;
    last if not defined $line_1 and not defined $line_2;
    print {$output_fh} $line_1 if defined $line_1;
    print {$output_fh} $line_2 if defined $line_2;

The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.

Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.

Solution 4

Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.

The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.

#!/usr/bin/perl -w
use strict;

open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);

my @file1_lines;
my @file2_lines;

while (<IN1>) {
    push (@file1_lines, $_);
close IN1;
while (<IN2>) {
    push (@file2_lines, $_);
close IN2;

my $file1_items = @file1_lines;
my $file2_items = @file2_lines;

open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
    if (defined($file1_lines[$i])) {
        print OUT $file1_lines[$i];
    if (defined($file2_lines[$i])) {
        print OUT $file2_lines[$i];
close OUT;

Related videos on Youtube

Author by


New to working with Oracle PL/SQL &amp; BPEL while trying to continually learn. Any help that I get I will try and pay forward as I understand this world better.

Updated on September 14, 2022


  • Larry
    Larry 8 months

    I am reviewing for a test and I can't seem to get this example to code out right.

    Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.

    A sample invocation: ileaf file1 file2 outfile

    This is what I have:

    #!/usr/bin/perl -w
    open(file1, "$ARGV[0]");
    open(file2, "$ARGV[1]");
    open(file3, ">$ARGV[2]");
    while(($line1 = <file1>)||($line2 = <file2>)){
                print $line1;
                print $line2;

    This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.

    If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.

    What can I do to interleave the lines?

  • Larry
    Larry over 10 years
    That is working. I think I was trying to combine too many steps into one loop. Thanks Tim.
  • ikegami
    ikegami over 10 years
    Of course, readline($file1) is much more commonly written as <$file1>. It's also much more common to print to STDOUT and simply redirect the output to a file if desired (because it's far more flexible).
  • Lawrence Hutton
    Lawrence Hutton over 10 years
    In this particular case, I would argue that || rather than or is the correct choice. My general rule of thumb is || for logical operations (if ($a || $b)) and or for flow control (open or die), because the expectations that way tend to match the operator precedence rules.
  • David W.
    David W. over 10 years
    I guess it's a style preference. The main thing is that the order of precedence is lower with or and and and usually works out better when you don't parenthesize. As an old C programmer, I too thought || and && look better in logic, but if you're suppose to downplay the C Style for loop, I guess the other C stuff should be deprecated too. .