How to read the user input line by line until Ctrl+D and include the line where Ctrl+D was typed
Solution 1
To do that, you'd have to read character by character, not line by line.
Why? The shell very likely uses the standard C library function read()
to read the data that the user is typing in, and that function returns
the number of bytes actually read. If it returns zero, that means it has
encountered EOF (see the read(2)
manual; man 2 read
). Note that EOF
isn't a character but a condition, i.e. the condition "there is nothing
more to be read", end-of-file.
Ctrl+D sends an end-of-transmission character
(EOT, ASCII character code 4, $'\04'
in bash
) to the terminal
driver. This has the effect of sending whatever there is to send to the
waiting read()
call of the shell.
When you press Ctrl+D halfway through
entering the text on a line, whatever you have typed so far is
sent to the shell1. This means that if you enter
Ctrl+D twice after having typed something on
a line, the first one will send some data, and the second one will
send nothing, and the read()
call will return zero and the shell
interpret that as EOF. Likewise, if you press Enter followed
by Ctrl+D, the shell gets EOF at once as there
wasn't any data to send.
So how to avoid having to type Ctrl+D twice?
As I said, read single characters. When you use the read
shell
built-in command, it probably has an input buffer and asks read()
to
read a maximum of that many characters from the input stream (maybe 16
kb or so). This means that the shell will get a bunch of 16 kb chunks
of input, followed by a chunk that may be less than 16 kb, followed by
zero bytes (EOF). Once encountering the end of input (or a newline, or a
specified delimiter), control is returned to the script.
If you use read -n 1
to read a single character, the shell will use
a buffer of a single byte in its call to read()
, i.e. it will sit in
a tight loop reading character by character, returning control to the
shell script after each one.
The only issue with read -n
is that it sets the terminal to "raw
mode", which means that characters are sent as they are without any
interpretation. For example, if you press Ctrl+D,
you'll get a literal EOT character in your string. So we have to check
for that. This also has the side-effect that the user will be unable to edit the line before submitting it to the script, for example by pressing Backspace, or by using Ctrl+W (to delete the previous word) or Ctrl+U (to delete to the beginning of the line).
To make a long story short: The following is the final loop that your
bash
script needs to do to read a line of input, while at the same time
allowing the user to interrupt the input at any time by pressing
Ctrl+D:
while true; do
line=''
while IFS= read -r -N 1 ch; do
case "$ch" in
$'\04') got_eot=1 ;&
$'\n') break ;;
*) line="$line$ch" ;;
esac
done
printf 'line: "%s"\n' "$line"
if (( got_eot )); then
break
fi
done
Without going into too much detail about this:
IFS=
clears theIFS
variable. Without this, we would not be able to read spaces. I useread -N
instead ofread -n
, otherwise we wouldn't be able to detect newlines. The-r
option toread
enables us to read backslashes properly.The
case
statement acts on each read character ($ch
). If an EOT ($'\04'
) is detected, it setsgot_eot
to 1 and then falls through to thebreak
statement which gets it out of the inner loop. If a newline ($'\n'
) is detected, it just breaks out of the inner loop. Otherwise it adds the character to the end of theline
variable.After the loop, the line is printed to standard output. This would be where you call your script or function that uses
"$line"
. If we got here by detecting an EOT, we exit the outermost loop.
1 You may test this by running cat >file
in one terminal
and tail -f file
in another, and then enter a partial line into the
cat
and press Ctrl+D to see what happens in the
output of tail
.
For ksh93
users: The loop above will read a carriage return character rather than a newline character in ksh93
, which means that the test for $'\n'
will need to change to a test for $'\r'
. The shell will also display these as ^M
.
To work around this:
stty_saved="$( stty -g )" stty -echoctl # the loop goes here, with $'\n' replaced by $'\r' stty "$stty_saved"
You might also want to output a newline explicitly just before the break
to get exactly the same behaviour as in bash
.
Solution 2
In the default mode of the terminal device, the read()
system call (when called with large enough a buffer) would lead full lines. The only times when the read data would not end in a newline character would be when you press Ctrl-D.
In my tests (on Linux, FreeBSD and Solaris), a single read()
only ever yields one single line even if the user has entered more by the time read()
is called. The only case where the read data could contain more than one line would be when the user enters a newline as Ctrl+VCtrl+J (the literal-next character followed by a literal newline character (as opposed to a carriage-return converted to newline when you press Enter)).
The read
shell builtin however reads the input one byte at a time until it sees a newline character or end of file. That end of file would be when read(0, buf, 1)
returns 0 which can only happen when you press Ctrl-D on an empty line.
Here, you'd want to do large reads and detect the Ctrl-D when the input doesn't end in a newline character.
You can't do that with the read
builtin, but you could do it with the sysread
builtin of zsh
.
If you want to account for the user typing ^V^J
:
#! /bin/zsh -
zmodload zsh/system # for sysread
myfunction() printf 'Got: <%s>\n' "$1"
lines=('')
while (($#lines)); do
if (($#lines == 1)) && [[ $lines[1] == '' ]]; then
sysread
lines=("${(@f)REPLY}") # split on newline
continue
fi
# pop one line
line=$lines[1]
lines[1]=()
myfunction "$line"
done
If you want to consider foo^V^Jbar
as a single record (with an embedded newline), that is assume each read()
returns one record:
#! /bin/zsh -
zmodload zsh/system # for sysread
myfunction() printf 'Got: <%s>\n' "$1"
finished=false
while ! $finished && sysread line; do
if [[ $line = *$'\n' ]]; then
line=${line%?} # strip the newline
else
finished=true
fi
myfunction "$line"
done
Alternatively, with zsh
, you could use zsh
's own advanced line editor to input the data and map ^D
there to a widget that signals the end of input:
#! /bin/zsh -
myfunction() printf 'Got: <%s>\n' "$1"
finished=false
finish() {
finished=true
zle .accept-line
}
zle -N finish
bindkey '^D' finish
while ! $finished && line= && vared line; do
myfunction "$line"
done
With bash
or other POSIX shells, for an equivalent of the sysread
approach, you could do something approaching by using dd
to do the read()
system calls:
#! /bin/sh -
sysread() {
# add a . to preserve the trailing newlines
REPLY=$(dd bs=8192 count=1 2> /dev/null; echo .)
REPLY=${REPLY%?} # strip the .
[ -n "$REPLY" ]
}
myfunction() { printf 'Got: <%s>\n' "$1"; }
nl='
'
finished=false
while ! "$finished" && sysread; do
case $REPLY in
(*"$nl") line=${REPLY%?};; # strip the newline
(*) line=$REPLY finished=true
esac
myfunction "$line"
done
Related videos on Youtube
remi
Updated on September 18, 2022Comments
-
remi over 1 year
The objective is to parse a regular expression and replace the matched pattern.
Consider this example:
data <- c("cat 6kg","cat g250", "cat dog","cat 10 kg")
I have to locate all occurrences of
cat
and a number[0-9]
. To do this:found <- data[grepl("(^cat.[a-z][0-9])|(^cat.[0-9])",data)] found [1] "cat 6kg" "cat g250" "cat 10 kg"
The next step is to replace each element of
found
with stringcat
. I have attempted gsub, sub, and gsubfn() from package (gsubfn) according to Stack question 20219311:gsubfn("((^cat.[a-z][0-9])|(^cat.[0-9]))", "cat",data) [1] "catkg" "cat50" "cat dog" "cat0 kg"
which is NOT the expected result:
[#] "cat" "cat" "cat dog" "cat"
I think I'm missing a point. I would appreciate any help I could get. Thanks.
-
remi almost 9 yearsThanks Avinash. Just curious why didn't gsub work? Also, how to automate the replacement for any string
str
(ie,cat
). Do I have to use paste()... -
Avinash Raj almost 9 years@remi gsubfn just replaces all the matched chars with
cat
, so it leaves the unmatched characters. trygsub("^cat.[a-z]?[0-9].*", "cat", x)
. This would match all the remaining chars because we added.*
at the last. -
Kusalananda over 7 years@StéphaneChazelas Thanks. Yes, I will note this in my answer.
-
Stéphane Chazelas over 7 years@Kusalananda, 8192 is what zsh
sysread
uses by default (also the maximum supported there). What feature ofzsh
do you not find useful? Note thatzsh
code is also in different dynamically modules, so the bloat doesn't affect it as much as other shells like ksh93 or bash. -
Kusalananda over 7 years
-
fabiomaia over 2 yearsdoesn't work,
input
variable returns just the first line