redirect output to specific line number

5,581

Solution 1

If you time the steps you take correctly, this can be pretty easy. Most important is to get a buffer of your source file that is not going to implode if overworked. The only real way to do that is with another file - which the shell makes very easy to do.

{   head -n "$((num_lines_before_insert))"
    grep key temp_file; sed \$d
}   <<SOURCE_FILE >desired.txt
$(  cat <source_file;echo .)
SOURCE_FILE

So, for most shells, (to include bash and zsh, but not dash or yash) when you get a <<here_document the shell creates a uniquely named temp file in ${TMPDIR:-/tmp}, execs it on the input file descriptor you specify - (or, by default, just 0) - and promptly deletes it. By the time it is served as input to your command, it is an un_named file - it has no remaining links to any filesystem and is just waiting for the kernel to clean it up before it disappears completely. It is a proper file - its data exists somewhere on disk (or, at least, within VFS in the likely case of tmpfs) and the kernel will ensure it continues to do so at least until you release the file descriptor.

In that way - for so long as your shell gets an actual backing file for the heredoc - they represent very secure and simple means of handling temporary file needs because they are fully written and all filesystem names are already removed from them before ever you read them. So their data cannot be tampered with while you work.

The above block first writes the temp file with cat and preserves any/all trailing blank-lines from the command-substitution with echo - which adds a single line to the tail of the file. From the { compound command } statement the output of its three three commands is written to desired.txt - two of which read in their turn from the heredocument the head and tail of the source file - and the grep command which inserts your key match.

I'm not certain if you needed this - but I thought it was relevant to show that you can simply and safely fully overwrite a source file with a sequence like this.

If your shell doesn't get an actual file for heredocs, you can emulate what it does like...

{   set "$$" "${TMPDIR:-/tmp}" "$@"
    exec <"$2/$(  set  -C
         >"$2/$1" cat  &&
         echo "$1")"  >&1 
    rm -- "$2/$1";shift 2
    head "-n$((before))"
    grep ... keyfile; cat
} <source_file 1<>source_file

...which will ensure all files are writable and safely assigned to file-descriptors before taking any irreversible action, but also does all filesystem cleanup before doing same.

Here is a test I ran to demonstrate this:

cd /tmp
set "$$" "${TMPDIR:-/tmp}" "$@"
seq 5000000 >test
printf line\ %s\\n 1 2 3 4 5 >test2
{   exec <"$2/$(  set  -C
         >"$2/$1" cat  &&
         echo "$1")"  >&1 
    rm -- "$2/$1";shift 2
    head -n2500000
    grep 3 test2;cat
} <test 1<>test

This first created two files - one called /tmp/test which was just 5 million numbered lines as written by seq and a second called /tmp/test2 which was just 5 lines like...

line 1
line 2
line 3
line 4
line 5

I next ran the above block, then I did...

sed -n '1p;$p;2499999,2500002l' <test
wc -l test

...which, interestingly, took practically the same amount of time to perform as the insert operation, and printed:

1
2499999$
2500000$
line 3$
2500001$
5000000
5000001 test

So here's how this works:

  1. The 1<> redirection is important - it sets the O_RDWR flag on stdout and ensures that as each process writes into the file it writes over the file's previous contents. In other words, this means that at no point is the source/destination file ever truncated, but is rather rewritten head to tail.
  2. The command substitution for exec gets the racy part done as soon as is possible (or as soon as I know it can be). Within the command sub noclobber is set so if "${TMPDIR:-/tmp}/$$" already exists the expansion results in exec <"${TMPDIR:-/tmp}/" which, in an interactive shell will cease the whole process right away, or, in a script, will cause the script to exit with a meaningful error as the shell cannot exec a directory as stdin.
  3. Within the command sub cat copies source_file to a temp file that doesn't already exist and echo writes the name to stdout.
  4. As soon as all file handles are execed rm unlink()s the new temp file so its only fleeting claim to existence now is < the redirect it was just assigned.
  5. head seeks through 2.5mil lines and writes over source_file's first 2.5mil lines. The point is to seek through both files to equal offsets.
    • That in mind, this portion could be more i/o efficient if the newly created tmp file is on a tmpfs and the source file is on a disk if the i/o were reversed here and head read from the on-disk file and wrote to the file in RAM.
    • If you wanted to do that though you'd need to do exec <>"$(... head ... <&1 >&0 to make the tmp file read/writable and maybe use head/tail with a specified number of lines for the tail end. In that case the number need not even be exact - you can loop over input in similar fashion - advancing the offset only a little at a time. The shell's builtin read can be used to test for EOF - or wc can be used at loop open.
    • This is because cat will probably just hang on a <>stdin becuase it will never see EOF.
  6. grep reads some data from some other file and writes it into source_file overwriting only as many bytes as it read from elsewhere.
  7. cat corrects whatever discrepancy grep may just have caused by writing what remains of its stdin out to its stdout 1<>source_file.

Solution 2

Not really suitable for huge files but ed can read a command output and insert it after the addressed line, e.g.:

ed -s desired.txt <<IN
4r !grep "key" temp_file
w
q
IN

or, in one line:

printf '%s\n' '4r !grep "key" temp_file' w q | ed -s desired.txt

You can insert the output from different commands at different line numbers just keep in mind you have to work backwards when editing via line number addresses:

ed -s desired.txt <<IN
48r !grep "another_key" another_temp_file
4r !grep "key" temp_file
w
q
IN
Share:
5,581

Related videos on Youtube

JigarGandhi
Author by

JigarGandhi

VLSI Engg. currently working as VLSI RnD Engg. Enjoy TCL scripting &amp; shell scripting. loves rubik cubing &amp; cycling too

Updated on September 18, 2022

Comments

  • JigarGandhi
    JigarGandhi over 1 year

    I want to grep certain lines by keyword and redirect the output to specific line number of an existing file.

    Command

    grep "key" temp_file >> desired.txt
    

    What I need is that I can append the grepped lines to particular line number say x of file desired.txt

    • orion
      orion over 9 years
      You have a few good solutions below, but my question is simply, is this the best way to solve your problem? It makes sense, if this happens once. If you replace several lines in a longer script, it would make more sense to read the file (if it isn't too big) into an array, modify the array, and write it back at the end. That's because replacing a line in the middle is expensive - the entire file has to be overwritten from that point forward (unless you had fixed-width lines, then you could use dd and simply write over the chosen line in-place).
    • orion
      orion over 9 years
      @mikeserv As I said, if this line happens once, just overwrite it, no problem. But if this grep line happens 200 times in a script, writing once from an array is obviously faster than overwriting 200 times.
  • mikeserv
    mikeserv over 9 years
    This breaks for backslashes and - depending on the sed - other special characters.
  • darnir
    darnir over 9 years
    Why not split this into two commands. Let grep do its job and output to a temp file and then at NR==3, simply print the entire file at that location?
  • mikeserv
    mikeserv over 9 years
    This is practically an exact duplicate of Costas's answer.