A better paste command

11,651

Solution 1

Assuming you don't have any tab characters in your files,

paste file1 file2 | expand -t 13

with the arg to -t suitably chosen to cover the desired max line width in file1.

OP has added a more flexible solution:

I did this so it works without the magic number 13:

paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))

It's not easy to type but can be used in a script.

Solution 2

I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.

First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.

awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2

And here is the fully documented version of the above.

# 2013-11-05 [email protected]
# Invoke thus:
#   awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }

# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
#   and we load the strings into array "s1"
#   and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }

# and when they aren't, we're reading the
#   second file and we put the strings into
#   array s2
{s2[FNR]=$0}

# At the end, after all lines from both files have
# been read,
END {
  # use the max line length to create a printf format
  # the right widths
  format = "%-" max "s\t%-" max "s\n"
  # and figure the number of array elements we need
  # to cycle through in a for loop.
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) {
     printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
  }
}

Solution 3

On Debian and derivatives, column has a -n nomerge option that allows column to do the right thing with empty fields. Internally, column uses the wcstok(wcs, delim, ptr) function, which splits a wide character string into tokens delimited by the wide characters in the delim argument.

wcstok starts by skipping wide characters in delim, before recognizing the token. The -n option uses an algorythm that doesn't skip initial wide-characters in delim.

Unfortunately, this isn't very portable: -n is Debian-specific, and column is not in POSIX, it's apparently a BSD thing.

Solution 4

Taking out the dots that you used for padding:

file1:

ETIAM
SED
MAECENAS
DONEC
SUSPENDISSE

file2:

Lorem
Proin
Nunc
Quisque
Aenean
Nam
Vivamus
Curabitur
Nullam

Try this:

$ ( echo ".TS"; echo "l l."; paste file1 file2; echo ".TE" ) | tbl | nroff | more

And you will get:

ETIAM         Lorem
SED           Proin
MAECENAS      Nunc
DONEC         Quisque
SUSPENDISSE   Aenean
              Nam
              Vivamus
              Curabitur
              Nullam

Solution 5

Not a very good solution but I was able to do it using

paste file1 file2 | sed 's/^TAB/&&/'

where TAB is replaced with the tab character.

Share:
11,651

Related videos on Youtube

Tulains Córdova
Author by

Tulains Córdova

I seek not to know all the answers, but to understand the questions. - Kwai Chang Caine I've been programming for more than two decades. I specialize in database design, SQL, PL/SQL, Unix Shell Scripting, Java SE, OOP, OOD, good practices, SOLID principles, software patterns and code quality. Manipulating and processing data with scores of different tools is something I do often. I've done extensive work automating routinary tasks by leveraging Shell Scripting. I like to improve software usability and user experience. Fledgling DBA with 6 years of experience but with an awful lot to learn. Spanish is my mother tongue and I'm fluent in english. I can read technical french.

Updated on September 18, 2022

Comments

  • Tulains Córdova
    Tulains Córdova over 1 year

    I have the following two files ( I padded the lines with dots so every line in a file is the same width and made file1 all caps to make it more clear).

    contents of file1:
    
    ETIAM......
    SED........
    MAECENAS...
    DONEC......
    SUSPENDISSE
    
    contents of file2
    
    Lorem....
    Proin....
    Nunc.....
    Quisque..
    Aenean...
    Nam......
    Vivamus..
    Curabitur
    Nullam...
    

    Notice that file2 is longer than file1.

    When I run this command:

    paste file1 file2
    

    I get this output

    ETIAM...... Lorem....
    SED........ Proin....
    MAECENAS... Nunc.....
    DONEC...... Quisque..
    SUSPENDISSE Aenean...
        Nam......
        Vivamus..
        Curabitur
        Nullam...
    

    What can I do for the output to be as follows ?

    ETIAM...... Lorem....
    SED........ Proin....
    MAECENAS... Nunc.....
    DONEC...... Quisque..
    SUSPENDISSE Aenean...
                Nam......
                Vivamus..
                Curabitur
                Nullam...
    

    I tried

    paste file1 file2 | column -t
    

    but it does this:

    ETIAM......  Lorem....
    SED........  Proin....
    MAECENAS...  Nunc.....
    DONEC......  Quisque..
    SUSPENDISSE  Aenean...
    Nam......
    Vivamus..
    Curabitur
    Nullam...
    

    non as ugly as the original output but wrong column-wise anyway.

    • unxnut
      unxnut over 10 years
      paste is using tabs in front of the lines from second file. You may have to use a postprocessor to align the columns appropriately.
    • ninjalj
      ninjalj over 10 years
      paste file1 file2 | column -tn ?
    • RSFalcon7
      RSFalcon7 over 10 years
      does file1 always have fixed size columns?
    • Tulains Córdova
      Tulains Córdova over 10 years
      @RSFalcon7 Yes, it does.
    • bac0n
      bac0n over 3 years
      paste file[12] | column -s $'\t' -t -o ' ' or have I missed something?
  • tuzion
    tuzion over 10 years
    What is the role of && in the sed command?
  • unxnut
    unxnut over 10 years
    A single & puts what is being searched for (a tab in this case). This command simply replaces the tab at the beginning with two tabs.
  • rubo77
    rubo77 over 10 years
    I had to change TAB to \t to make this work in zsh on Ubuntu debian. And it does only work if file1 has less than 15 chars
  • rubo77
    rubo77 over 10 years
    How do you use this on file1 and file2? I called the script paste-awk and tried paste file1 file2|paste-awk and I tried awk paste-awk file1 file2 but none worked.
  • rubo77
    rubo77 over 10 years
    I get awk: Line:1: (FILENAME=file1 FNR=1) Fatal: Division by zero
  • ninjalj
    ninjalj over 10 years
    @rubo77: awk -f paste-awk file1 file2 should work, at least for GNU awk and mawk.
  • ninjalj
    ninjalj over 10 years
    @rubo77: the field separator can be set with -F\\t
  • don_crissti
    don_crissti over 7 years
    This, like the other solutions using paste will fail to print the proper output if there are any lines containing tabs. +1 for being different though
  • Tulains Córdova
    Tulains Córdova over 7 years
    +1. Would you please explain how the solution works?
  • TabeaKischka
    TabeaKischka over 5 years
    nice! I didn't know about expand before I read your answer :)