shell: read: differentiate between EOF and newline
Solution 1
With read -n "$n"
(not a POSIX feature), and if stdin is a terminal device, read
puts the terminal out of the icanon
mode, as otherwise read
would only see full lines as returned by the terminal line discipline internal line editor and then reads one byte at a time until $n
characters or a newline have been read (you may see unexpected results if invalid characters are entered).
It reads up to $n
character from one line. You'll also need to empty $IFS
for it not to strip IFS characters from the input.
Since we leave the icanon
mode, ^D
is no longer special. So if you press Ctrl+D, the ^D
character will be read.
You wouldn't see eof from the terminal device unless the terminal is somehow disconnected. If stdin is another type of file, you may see eof (like in : | IFS= read -rn 1; echo "$?"
where stdin is an empty pipe, or with redirecting stdin from /dev/null
)
read
will return 0 if $n
characters (bytes not forming part of valid characters being counted as 1 character) or a full line have been read.
So, in the special case of only one character being requested:
if IFS= read -rn 1 var; then
if [ "${#var}" -eq 0 ]; then
echo an empty line was read
else
printf %s "${#var} character "
(export LC_ALL=C; printf '%s\n' "made of ${#var} byte(s) was read")
fi
else
echo "EOF found"
fi
Doing it POSIXly is rather complicated.
That would be something like (assuming an ASCII-based (as opposed to EBCDIC for instance) system):
readk() {
REPLY= ret=1
if [ -t 0 ]; then
saved_settings=$(stty -g)
stty -icanon min 1 time 0 icrnl
fi
while true; do
code=$(dd bs=1 count=1 2> /dev/null | od -An -vto1 | tr -cd 0-7)
[ -n "$code" ] || break
case $code in
000 | 012) ret=0; break;; # can't store NUL in variable anyway
(*) REPLY=$REPLY$(printf "\\$code");;
esac
if expr " $REPLY" : ' .' > /dev/null; then
ret=0
break
fi
done
if [ -t 0 ]; then
stty "$saved_settings"
fi
return "$ret"
}
Note that we return only when a full character has been read. If the input is in the wrong encoding (different from the locale's encoding), for instance if your terminal sends é
encoded in iso8859-1 (0xe9) when we expect UTF-8 (0xc3 0xa9), then you may enter as many é
as you like, the function will not return. bash
's read -n1
would return upon the second 0xe9 (and store both in the variable) which is a slightly better behaviour.
If you also wanted to read a ^C
character upon Ctrl+C (instead of letting it kill your script; also for ^Z
, ^\
...), or
^S
/^Q
upon Ctrl+S/Q (instead of flow control), you could add a -isig -ixon
to the stty
line. Note that bash
's read -n1
doesn't do it either (it even restores isig
if it was off).
That will not restore the tty settings if the script is killed (like if you press Ctrl+C. You could add a trap
, but that would potentially override other trap
s in the script.
You could also use zsh
instead of bash
, where read -k
(which predates ksh93
or bash
's read -n/-N
) reads one character from the terminal and handles ^D
by itself (returns non-zero if that character is entered) and doesn't treat newline specially.
if read -k k; then
printf '1 character entered: %q\n' $k
fi
Solution 2
In f()
change the %s
to %q
:
f() { read -rn 1 -p "Enter a character: " char && \
printf "\nYou entered '%q'\n" "$char"; }
f;f
Output, if the user enters a newline, then 'Ctrl-D':
Enter a character:
You entered ''''
Enter a character: ^D
You entered '$'\004''
From `man printf:
%q ARGUMENT is printed in a format that can be reused as shell input,
escaping non-printable characters with the proposed POSIX $'' syntax.
Solution 3
Actually, if you run read -rn1
in Bash, and hit ^D
, it's treated as the literal control character, not an EOF condition. The control character just isn't visible when printed, so it doesn't appear with printf "'%s'"
.
Piping the output to something like od -c
would show it, as would printf "%q"
which other answers already mentioned.
With actually nothing as input, the result is different, here empty even with printf "%q"
:
$ f() { read -rn 1 x ; printf "%q\n" "$x"; }
$ printf "" | f
''
The newline isn't returned by read
here for two reasons. First, it's the default line delimiter of read, and hence returned as output. Second, it's also part of the default IFS
, and read
removes leading and trailing whitespace if they are part of IFS
.
So, we need read -d
to change the delimiter from the default, and make IFS
empty:
$ g() { IFS= read -rn 1 -d '' x ; printf "%q\n" "$x"; }
$ printf "\n" | g
$'\n'
read -d ""
makes the delimiter effectively the NUL byte, which means this still doesn't tell the difference between an input of nothing, and an input of a NUL byte:
$ printf "" | g
''
$ printf "\000" | g
''
Though with nothing as input, read
returns false, so we could check $?
to detect that.
Tom Hale
Updated on September 18, 2022Comments
-
Tom Hale over 1 year
Reading a single character, how can I tell the difference between the null
<EOF>
and\n
?Eg:
f() { read -rn 1 -p "Enter a character: " char && printf "\nYou entered '%s'\n" "$char"; }
With a printable character:
$ f Enter a character: x You entered 'x'
When pressing Enter:
$ f Enter a character: You entered ''
When pressing Ctrl + D:
$ f Enter a character: ^D You entered '' $
Why is the output the same in the last two cases? How can I distinguish between them?
Is there a different way to do this in POSIX shell vs
bash
? -
Tom Hale almost 7 yearsHow do I get the newline case to show
You entered '$'\012''
vs the null character it's currently showing? -
n.caillou almost 7 years@TomHale The newline terminates the input, it isn't part of it
-
Admin almost 7 yearsNot with
-n 1
. The status is 0. -
meuh almost 7 years@TomHale You can capture the newline if you add
-d ''
-
Tom Hale almost 7 yearsI love your answers, @Stéphane Chazelas. If we leave icanon mode and can capture
^D
, then why can't we capture\n
? -
Admin almost 7 years
-n 1
and status 0 indicates a `\n' which is mysteriously removed. -
Tom Hale almost 7 yearsNot with:
f() { read -rd '' -n1 -p "Enter a character: " char && printf "\nYou entered: %q\n" "$char"; }
I raised a separate question for this. -
Stéphane Chazelas almost 7 years@TomHale, see edit if you can use
zsh
. For the POSIX approach, you can take care of newline in the 012 case. -
done almost 7 yearsWith
-icannon
actrl-s
(as one example of un-managed input) will put the code in suspension. That will block the TTY until actrl-q
is issued. There are several other keys that will not be read but will afect the tty, as an aditional examplectrl-C
. -
done almost 7 yearsNot a big issue but: It will be wise to change the
printf "\\$code"
toprintf '%s' "\\$code"
as the value of$code
could be anything if the-t
test fails. -
done almost 7 yearsWhy do you need
REPLY=$REPLY……
if the function is reading a one character anyway ? -
Stéphane Chazelas almost 7 years@Arrow, re: ^C/^S, note that bash's
read -n1
doesn't reset isig/ixon either. It seems reasonable to still allow the user to interrupt/quit/suspend the script or suspend output here. I've still added a not to that effect. Thanks. -
Stéphane Chazelas almost 7 years@Arrow,
$code
contains the 3 digit octal code of a single byte read from stdin (whether stdin is a terminal device or not). So it's eitherprintf "\\$code"
orprintf %b "\0$code"
-
Stéphane Chazelas almost 7 years@Arrow, re:
REPLY=$REPLY...
,dd
reads one byte at a time, so we need to keep reading until the end of the character as characters can be made of more than one byte (most are in UTF-8) likeread -n1
does. Theexpr
command is the one checking that a full character has been read (matches.
). If the input is in the wrong encoding (different from the locale's), that may never return. Maybe I should add a note to that effect.