Linux command to concatenate a file to itself n times

24,165

Solution 1

Two parts to this, to me - first - to use cat to output the text file to standard output, and use append to add it to another file - eg foo.txt>>bar.txt will append foo.txt to bar.txt

then run it n times with

for i in {1..n};do cat foo.txt >> bar.txt; done

replacing n in that command with your number

should work, where n is your number

If you use csh, there's the 'repeat' command.

repeat related parts of the answer are copied from here , and i tested it on an ubuntu 11.04 system on the default bash shell.

Solution 2

You certainly can use cat for this:

$ cat /tmp/f
foo
$ cat /tmp/f /tmp/f
foo
foo

To get $n copies, you could use yes piped into head -n $n:

$ yes /tmp/f | head -n 10
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f

Putting that together gives

yes /tmp/f | head -n $n | xargs cat >/tmp/output

Solution 3

I am bored so here are a few more methods on how to concatenate a file to itself, mostly with head as a crutch. Pardon me if I overexplain myself, I just like saying things :P


Assuming N is the number of self concatenations you want to do and that your file is named file.

Variables:

linecount=$(<file wc -l)

total_repeats=$(echo "2^$N - 1" | bc) # obtained through the power of MATH

total_lines=$((linecount*(total_repeats+1)))

tmp=$(mktemp --suffix .concat.self)

Given a copy of file called file2, total_repeats is the number of times file would need to be added to file2 to make it the same as if file was concatenated to itself N times.

Said MATH is here, more or less: MATH (gist)

It's first semester computer science stuff but It's been a while since I did a induction proof so I can't get over it... (also this class of recursion is pretty well known to be 2^Loops so there is that too....)


POSIX

I use a few non-posix things but they are not essential. For my purposes:

 yes() { while true; do echo "$1"; done; }

Oh, I only used that. Oh well, the section is already here...


Methods


head with linecount tracking.

ln=$linecount
for i in $(seq 1 $N); do
    <file head -n $ln >> file;
    ln=$((ln*2))
done

No temp file, no cat, not even too much math yet, all joy.


tee with MATH

<file tee -a file | head -n $total_lines > $tmp
cat $tmp > file

Here tee is reading from file but perpetually appending to it, so it will keep reading the file on repeat until head stops it. And we know when to stop it because of MATH. The appending goes overboard through, so I used a temp file. You could trim the excess lines from file too.


eval, the lord of darkness!

eval "cat $(yes file | head -n $((total_repeats+1)) | tr '\n' ' ')" > $tmp
cat $tmp > file

This just expands to cat file file file ... and evals it. You can do it without the $tmp file, too:

eval "cat $(yes file | head -n $total_repeats | tr '\n' ' ')" |
  head -n $((total_lines-linecount)) >> file

The second head "tricks" cat by putting a middle man between it and the write operation. You could trick cat with another cat as well but that has inconsistent behavior. Try this:

test_double_cat() {
    local Expected=0
    local Got=0
    local R=0
    local file="$(mktemp --suffix .double.cat)"
    for i in $(seq 1 100); do

        printf "" > $file
        echo "1" >> $file
        echo "2" >> $file
        echo "3" >> $file

        Expected=$((3*$(<file wc -l)))

        cat $file $file | cat >> $file

        Got=$(<file wc -l)

        [ "$Expected" = "$Got" ] && R="$((R+1))"
    done
    echo "Got it right $R/100"
    rm $file
}

sed:

<file tr '\n' '\0' |
    sed -e "s/.*/$(yes '\0' | head -n $total_repeats | tr -d '\n')/g" |
        tr '\0' '\n' >> file

Forces sed into reading the entire file as a line, captures all of it, then pastes it $total_repeats number of times.

This will fail of course if you have any null characters in your file. Pick one that you know isn't there.

find_missing_char() {
  local file="${1:-/dev/stdin}"

  firstbyte="$(<$file fold -w1 | od -An -tuC | sort -un | head -n 1)"
  if [ ! "$firstbyte" = "0" ]; then
    echo "\0"
  else
    printf "\\$(printf '%03o\t' $((firstbyte-1)) )"
  fi
}

That's all for now lads, I hope this arbitrary answer didn't bother anyone. I tested all of them many times but I am only a two-year shell user so keep that in mind I guess. Now to sleep...

rm $tmp

Share:
24,165
Bryce Thomas
Author by

Bryce Thomas

https://www.linkedin.com/in/brycethomas/

Updated on September 18, 2022

Comments

  • Bryce Thomas
    Bryce Thomas over 1 year

    I've taken a plain text file book from Project Gutenberg (around 0.5MB) which I want to concatenate to itself n times in order to generate a large text file that I can benchmark some algorithms on. Is there a linux command I can use to achieve this? cat sounds ideal, but doesn't seem to play too nice with concatenating a file onto itself, plus does not directly address the n times part of the question.

    • Thalys
      Thalys over 12 years
      use some kind of loop, and appending? so repeat foo.txt>>bar.txt and wrap that up in something that will run the command that many times?
  • Arnout Engelen
    Arnout Engelen about 8 years
    Fun fact: this actually works without replacing 'n', in which case it'll execute the body once for each character between ASCII '1' and ASCII 'n' (so 62 times). But {1..12} will correctly run the body 12 times.
  • Toby Speight
    Toby Speight about 7 years
    You might want to just redirect the whole pipeline, rather than appending in each iteration: for i in {1..n};do cat foo.txt; done > bar.txt
  • rogerdpack
    rogerdpack over 2 years
    This doesn't give you the exponential growth size... :)