A better paste command

text-processing columns paste

11,651

Solution 1

Assuming you don't have any tab characters in your files,

paste file1 file2 | expand -t 13

with the arg to -t suitably chosen to cover the desired max line width in file1.

OP has added a more flexible solution:

I did this so it works without the magic number 13:

paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))

It's not easy to type but can be used in a script.

Solution 2

I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.

First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.

awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2

And here is the fully documented version of the above.

# 2013-11-05 [email protected]
# Invoke thus:
#   awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }

# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
#   and we load the strings into array "s1"
#   and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }

# and when they aren't, we're reading the
#   second file and we put the strings into
#   array s2
{s2[FNR]=$0}

# At the end, after all lines from both files have
# been read,
END {
  # use the max line length to create a printf format
  # the right widths
  format = "%-" max "s\t%-" max "s\n"
  # and figure the number of array elements we need
  # to cycle through in a for loop.
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) {
     printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
  }
}

Solution 3

On Debian and derivatives, column has a -n nomerge option that allows column to do the right thing with empty fields. Internally, column uses the wcstok(wcs, delim, ptr) function, which splits a wide character string into tokens delimited by the wide characters in the delim argument.

wcstok starts by skipping wide characters in delim, before recognizing the token. The -n option uses an algorythm that doesn't skip initial wide-characters in delim.

Unfortunately, this isn't very portable: -n is Debian-specific, and column is not in POSIX, it's apparently a BSD thing.

Solution 4

Taking out the dots that you used for padding:

file1:

ETIAM
SED
MAECENAS
DONEC
SUSPENDISSE

file2:

Lorem
Proin
Nunc
Quisque
Aenean
Nam
Vivamus
Curabitur
Nullam

Try this:

$ ( echo ".TS"; echo "l l."; paste file1 file2; echo ".TE" ) | tbl | nroff | more

And you will get:

ETIAM         Lorem
SED           Proin
MAECENAS      Nunc
DONEC         Quisque
SUSPENDISSE   Aenean
              Nam
              Vivamus
              Curabitur
              Nullam

Solution 5

Not a very good solution but I was able to do it using

paste file1 file2 | sed 's/^TAB/&&/'

where TAB is replaced with the tab character.

View more solutions

11,651

Tulains Córdova

I seek not to know all the answers, but to understand the questions. - Kwai Chang Caine I've been programming for more than two decades. I specialize in database design, SQL, PL/SQL, Unix Shell Scripting, Java SE, OOP, OOD, good practices, SOLID principles, software patterns and code quality. Manipulating and processing data with scores of different tools is something I do often. I've done extensive work automating routinary tasks by leveraging Shell Scripting. I like to improve software usability and user experience. Fledgling DBA with 6 years of experience but with an awful lot to learn. Spanish is my mother tongue and I'm fluent in english. I can read technical french.

Updated on September 18, 2022

Comments

Tulains Córdova over 1 year
I have the following two files ( I padded the lines with dots so every line in a file is the same width and made file1 all caps to make it more clear).
```
contents of file1:

ETIAM......
SED........
MAECENAS...
DONEC......
SUSPENDISSE

contents of file2

Lorem....
Proin....
Nunc.....
Quisque..
Aenean...
Nam......
Vivamus..
Curabitur
Nullam...
```
Notice that file2 is longer than file1.

When I run this command:
```
paste file1 file2
```
I get this output
```
ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
    Nam......
    Vivamus..
    Curabitur
    Nullam...
```
What can I do for the output to be as follows ?
```
ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
            Nam......
            Vivamus..
            Curabitur
            Nullam...
```
I tried
```
paste file1 file2 | column -t
```
but it does this:
```
ETIAM......  Lorem....
SED........  Proin....
MAECENAS...  Nunc.....
DONEC......  Quisque..
SUSPENDISSE  Aenean...
Nam......
Vivamus..
Curabitur
Nullam...
```
non as ugly as the original output but wrong column-wise anyway.
- unxnut over 10 years
  
  paste is using tabs in front of the lines from second file. You may have to use a postprocessor to align the columns appropriately.
- ninjalj over 10 years
  
  paste file1 file2 | column -tn ?
- RSFalcon7 over 10 years
  
  does file1 always have fixed size columns?
- Tulains Córdova over 10 years
  
  @RSFalcon7 Yes, it does.
- bac0n over 3 years
  
  paste file[12] | column -s $'\t' -t -o ' ' or have I missed something?
tuzion over 10 years

What is the role of && in the sed command?
unxnut over 10 years

A single & puts what is being searched for (a tab in this case). This command simply replaces the tab at the beginning with two tabs.
rubo77 over 10 years

I had to change TAB to \t to make this work in zsh on Ubuntu debian. And it does only work if file1 has less than 15 chars
rubo77 over 10 years

How do you use this on file1 and file2? I called the script paste-awk and tried paste file1 file2|paste-awk and I tried awk paste-awk file1 file2 but none worked.
rubo77 over 10 years

I get awk: Line:1: (FILENAME=file1 FNR=1) Fatal: Division by zero
ninjalj over 10 years

@rubo77: awk -f paste-awk file1 file2 should work, at least for GNU awk and mawk.
ninjalj over 10 years

@rubo77: the field separator can be set with -F\\t
don_crissti over 7 years

This, like the other solutions using paste will fail to print the proper output if there are any lines containing tabs. +1 for being different though
Tulains Córdova over 7 years

+1. Would you please explain how the solution works?
TabeaKischka over 5 years

nice! I didn't know about expand before I read your answer :)