How can I merge files on a line by line basis?

bash text-processing sed perl awk

17,314

Solution 1

The right tool for this job is probably paste

paste -d '' file1 file2

See man paste for details.

You could also use the pr command:

pr -TmJS"" file1 file2

where

-T turns off pagination
-mJ merge files, Joining full lines
-S"" separate the columns with an empty string

If you really wanted to do it using pure bash shell (not recommended), then this is what I'd suggest:

while IFS= read -u3 -r a && IFS= read -u4 -r b; do 
  printf '%s%s\n' "$a" "$b"
done 3<file1 4<file2

(Only including this because the subject came up in comments to another proposed pure-bash solution.)

Solution 2

Through awk way:

awk '{getline x<"file2"; print $0x}' file1

getline x<"file2" reads the entire line from file2 and holds into x variable.
print $0x prints the whole line from file1 by using $0 then x which is the saved line of file2.

Solution 3

paste is the way to go. If you want to check some other methods, here is a python solution:

#!/usr/bin/env python2
import itertools
with open('/path/to/file1') as f1, open('/path/to/file2') as f2:
    lines = itertools.izip_longest(f1, f2)
    for a, b in lines:
        if a and b:
            print a.rstrip() + b.rstrip()
        else:
            if a:
                print a.rstrip()
            else:
                print b.rstrip()

If you have few number of lines:

#!/usr/bin/env python2
with open('/path/to/file1') as f1, open('/path/to/file2') as f2:
    print '\n'.join((a.rstrip() + b.rstrip() for a, b in zip(f1, f2)))

Note that for unequal number of lines, this one will end at the last line of the file that ends first.

Solution 4

Also, with pure bash (notice that this will totally ignore empty lines):

#!/bin/bash

IFS=$'\n' GLOBIGNORE='*'
f1=($(< file1))
f2=($(< file2))
i=0
while [ "${f1[${i}]}" ] && [ "${f2[${i}]}" ]
do
    echo "${f1[${i}]}${f2[${i}]}" >> out
    ((i++))
done
while [ "${f1[${i}]}" ]
do
    echo "${f1[${i}]}" >> out
    ((i++))
done
while [ "${f2[${i}]}" ]
do
    echo "${f2[${i}]}" >> out
    ((i++))
done

Solution 5

The perl way, easy to understand:

#!/usr/bin/perl
$filename1=$ARGV[0];
$filename2=$ARGV[1];

open(my $fh1, "<", $filename1) or die "cannot open < $filename1: $!";
open(my $fh2, "<", $filename2) or die "cannot open < $filename2: $!";

my @array1;
my @array2;

while (my $line = <$fh1>) {
  chomp $line;
  push @array1, $line;
}
while (my $line = <$fh2>) {
  chomp $line;
  push @array2, $line;
}

for my $i (0 .. $#array1) {
  print @array1[$i].@array2[$i]."\n";
}

Start with:

./merge file1 file2

Output:

foobar
icecream
twohundred

View more solutions

17,314

TuxForLife

Updated on September 18, 2022

Comments

TuxForLife over 1 year
cat file1
```
foo
ice
two
```
cat file2
```
bar
cream
hundred
```
Desired output:
```
foobar
icecream
twohundred
```
file1 and file2 will always have the same amount of lines in my scenario, in case that makes things easier.
TuxForLife about 9 years

Awesome, thank you for the very simple solution. Should I ever worry about portability when it comes to using paste?
nettux about 9 years

@user264974 paste is in GNU Coreutils so you're probably fairly safe.
geirha about 9 years

This is just plain wrong. It doesn't work at all. Either use mapfile to read the files into arrays, or use a while loop with two read commands, reading from each their fd.
kos about 9 years

@geirha You're right, I messed up with the syntax, it's ok now.
geirha about 9 years

not quite. With the updated code, empty lines will be ignored, and if any line contains glob characters, the line might be replaced with matching filenames. So never use array=( $(cmd) ) or array=( $var ). Use mapfile instead.
kos about 9 years

@geirha You're right of course, I took care of the glob characters but I left the newline ignored, because in order to do that and in order to make a decent solution out of it it needs to be rewritten. I specified this and 'll leave this version in case it's going to be useful to somebody in the meantime. Thanks for your points so far.