Why are special characters such as "carriage return" represented as "^M"?
Solution 1
I believe that what OP was actually asking about is called Caret Notation.
Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code
The full list of ASCII control characters along with caret notation can be found here
Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.
Solution 2
That is exactly the reason.
ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7)
manual page from a random Linux system (man ascii
), up to and including CR (13):
Oct Dec Hex Char
─────────────────────────────────────────────
000 0 00 NUL '\0'
001 1 01 SOH (start of heading)
002 2 02 STX (start of text)
003 3 03 ETX (end of text)
004 4 04 EOT (end of transmission)
005 5 05 ENQ (enquiry)
006 6 06 ACK (acknowledge)
007 7 07 BEL '\a' (bell)
010 8 08 BS '\b' (backspace)
011 9 09 HT '\t' (horizontal tab)
012 10 0A LF '\n' (new line)
013 11 0B VT '\v' (vertical tab)
014 12 0C FF '\f' (form feed)
015 13 0D CR '\r' (carriage ret)
Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.
The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.
Solution 3
The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.
There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.
Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.
Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons
The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.
Solution 4
The caret (^) is just shorthand for writing hold the Control key - CTRL down.
In the good old days you could type these codes (see above) in directly, Ctrl key + G (^G) would make the terminal go "ding"
When you want to add a CR in Vim you use Ctrl key + M etc tab = Ctrl + I
Solution 5
The need for some visual manner of displaying what are by definition non-printable characters.
So, someone in the early 1970s (or maybe earlier) (I remember seeing it on CP/M, and someone else has already mentioned TOPS) decided that "caret plus letter" would be the symbol for the 26 unprintable ASCII control characters with values 1 thru 26. Value 0 is/was printed as ^@, and value 127 as ^?.
Related videos on Youtube
dotancohen
Updated on September 18, 2022Comments
-
dotancohen over 1 year
Why is
^M
used to represent a carriage return in VIM and other contexts?My guess is that
M
is the 13th letter of the Latin alphabet and a carriage return is\x0D
or decimal13
. Is this the reason? Is this representation documented anywhere?I notice that Tab is represented by
^I
, which is the ninth letter of the Latin alphabet. Conversely, Tab is\x09
or decimal9
, which supports my theory stated above. However, where might this be documented as fact?-
Admin almost 10 yearsAlso keep in mind that dos/windows use "0x0d 0x0a", also noted as "CR LF". But unix/linux use only "0x0a" or "LF". So when you open a windows document in linux it detects extra "CR", and when you open a linux document in windows it doesn't detect new lines.
-
Admin almost 10 years@LatinSuD caret notation (and corresponding use of the Ctrl-key) relates to the C0 control set (historically part of ASCII) directly and not whether and how a given operating system or program uses part of that set in representing new lines, or anything else. Similarly, whether
^H
deletes a character or allows overprinting (such asn^H~
as an obsolete way to produce ñ) or any other actual use of the control character is separate from the caret notation. -
Admin almost 10 yearsold one ... I can't remember the original code, but ctrl-G rings a bell!
-
Admin almost 10 yearsthe ^M you see when in linux (which uses "0x0a"(LF)) is probably from a file made on windows (which uses "0x0d 0x0a" (CR LF)). Thus, at the end of each line, you see the extra "0x0d" (CR). (the 0x0a being interpreted as a newline, and not shown in vi (well, it is : the next line will have a "~" if the previous line didn't end with a Newline). So the the ^M is not exactly a "carriage return", it's part of what a carriage return is in windows. The Answer tells why it's represented that way (using Caret Notation, ^@ = 0x00, ^A=0x01, etc..., ^M=0x0d, ...)
-
Admin almost 10 years@OlivierDulac no, the ^M is exactly a carriage return, just like ^J is exactly a line-feed. While different OSs have had different views as to whether line-feed and/or carriage return or something else (like the Newline character used by some IBM characters but not part of ASCII and so not part of the historical heritage of some other OSs) should represent a new line in a text file, and while some programs have then overridden that in different ways, U+000D itself is still a carriage return, whatever later operating systems like Unix or DOS decided to do with it. (Of course, calling it...
-
Admin almost 10 years@OlivierDulac ... U+000D is proleptic, since that name came with Unicode in the 1990s, but that does quite definitely reference the code as it existed in ASCII in 1963, anf through that as it existed in Murray's modified Baudot code in 1901. Murray was solving problems related to moving paper around, with the same tools used in the concept of "text file" many decades later. Hammer a screw into something like a nail, and it's still a screw. Use LF and/or CR to represent the end of a line in a text file, and they're still line-feeds and carriage returns.
-
Admin almost 10 years@JonHanna: apologies, i mixed in my comment carriage return and newlines.
-
Admin almost 10 yearsBecause Control-M was the ASR-33 TTY keyboard combination to get the character. (And yes, Brian, Ctrl-G does ring a bell.)
-
Admin almost 10 yearsHas nothing to do with "letter of the alphabet", other than when the ASCII table was laid out the alpha characters were assigned sequentially, starting from 0x41.
-
Admin almost 10 yearsI knew you could actually use ctrl+i as tab (I use it on connectbot on my phone in vim) I didn't realize that ^M works the same way, and they work basically everywhere. Cool!
-
-
user almost 10 yearsWhilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference. That way, should the linked page ever change or become invalid for any reason, the answer will still be useful to visitors to Super User.
-
dotancohen almost 10 yearsThank you. Though informative, this answer does not contain the answer to the question.
-
dotancohen almost 10 yearsThank you. Though informative, this answer does not contain the answer to the question.
-
dotancohen almost 10 yearsPerfect, thank you. This is exactly what I was looking for.
-
Deliss almost 10 yearsI always wondered what that thing was called...
-
keshlam almost 10 yearsThis convention goes back at least to the 1970's; I first saw it on the TOPS-10 operating system but it may well have existed earlier. For what it's worth, on older ASCII terminals the character now shown as a caret was actually an upward-pointing arrow, so this originated as "uparrow notation".
-
barlop almost 10 yearsMost of the control characters are meaningless, but even some of those with meaning like Ctrl-I i'm not sure where you can just do Ctrl-I and get a tab.
-
OrangeDog almost 10 yearsThis is explictly built into the ASCII design so that the Ctrl key just toggles bit 7.
-
Jon Hanna almost 10 yearsnone of the control characters are meaningless. Many of them are unused in many contexts, but every single one has at least one meaning.
-
barlop almost 10 years@JonHanna Of course I don't mean they were meaningless(past tense).But R.Have been meaningless for decades i.e. they had their original meanings from eons ago,tech that no longer runs, are (most of the chars) meaningless today w/ current and even slightly old tech.n if any are being put to modern uses it's not many. There's a list here en.wikipedia.org/wiki/Control_character of ones in common use 0,7,8,9,10,11,12,13,127. 9/33 so the others (24 of them) u would either c very rarely or not at all as they r as dead as the antic unused out of use for decades machinery they were used on
-
Jon Hanna almost 10 yearsAssociated Press still use ANPA-1312 which uses 1–4, 6 & 16 are used to start every TCP/IP connection. Modern printers (among other thigns) still use 17 & 19. Together with those you mention, we've quite a percentage of them covered without really trying. I'll grant you they aren't in heavy use, but they ain't dead either.
-
wchargin almost 10 years@barlop You can do
^I
for a tab in standard bash: typels ~/^I^I
and you should see all the folders in your home directory. -
pmms almost 10 yearsThe answer is hidden in the second paragraph:
^M
is shorthand for Control-M. On the terminal you would press the Control key together with the M key to send the ASCII kode 0x0D also known as a carriage return. -
Samin yeasir almost 10 yearsIt's not used only with letters. I would not define it as the control character with "the letter's numeric value" but rather as "xor 64". In other words,
^A
is0x41 xor 0x40
, or0x01
and^?
is0x3F xor 0x40
, or0x7F
. -
rossmcm almost 10 yearsIt's also not used just with ASCII characters anymore. Windows for example allows you to detect and act on
Ctrl-Del
(holdCtrl
down and press theDel
key). TheDel
key (orDelete
)has no ASCII value, yet we sometimes see it written as^Del
. -
barlop almost 10 years@JonHanna In the case of TCP, it uses SYN and ACK but not with those ascii codes of SYN-0x16(^V) and ACK-0x6(^F). TCP doesn't use that ASCII, it uses a single bit for SYN 0x002 and a single bit for ACK 0x010 And so any values with those bit set would indicate SYN and/or ACK. As for Printers DC1,DC3 and Associated Press and the AP-1312 that is an interesting case I see mentioned here too en.wikipedia.org/wiki/C0_and_C1_control_codes I suppose that counts but I wonder to what extent they are control characters if you can't make them with Ctrl - Maybe back in the day you could?
-
Scott almost 10 yearsThe term you are looking for is digraph, which means two characters that represent one character. Specifically, digraphs and trigraphs are used to represent nonprintable characters. Historically they have also been used for characters that do not appear on a keyboard, although with modern GUIs and keyboards this is less of an issue so this use is more archaic.
-
Daniel R Hicks almost 10 years@rossmcm - Actually, ASCII 0x7F is "DEL". Or course, what Windows regards as a valid key combo likely has no relation to reality.
-
Stuart Golodetz almost 10 yearsOn some level the extent to which we are still bound by design choices made for what now seem like ancient systems is quite surprising - I guess on reflection that (a) it's not that long ago, it's just that the pace of change in the interim has been astonishing, and (b) if enough design decisions are made, some of them (especially the ones that don't cause people enough problems) are bound to stick around long after the reasons for them disappear into memory. Still an odd feeling to look back at the history of some of these things though.
-
Daniel R Hicks almost 10 years@StuartGolodetz - Actually, I find it strangely reassuring. But then I remember when Teletypes were "advanced technology". (The Teletype ASR-33, by the way, was remarkable for it's elegant simplicity. I only wish that "modern" computer systems were as well-designed.)
-
CaptainCodeman almost 10 yearsThis is fascinating but what I don't understand is.. why of all things did they decide this typewriter needed a bell?
-
Daniel R Hicks almost 10 years@CaptainCodeman - When you transmitted an important message you'd ring the bell to get the attention of the operator on the other end.
-
Stuart Golodetz almost 10 years@DanielRHicks - I guess the thought it makes me have is that perhaps the gap between what we consider "modern" and "ancient" technology isn't nearly as large as one might think it is. Indeed, much supposedly modern technology incorporates things with very old roots, although each generation thinks they're doing everything from scratch. Those young'n's :)
-
Daniel R Hicks almost 10 yearsIt is interesting to note that the Ctrl key survives to this day on PC keyboards.
-
dotancohen almost 10 yearsI don't see a dedicated "RETURN" key, but I do see a LineFeed key. Is that what you mean?
-
Daniel R Hicks almost 10 years@dotancohen - Second row, far right, next to LINE FEED.
-
dotancohen almost 10 yearsThanks, I did not even recognize what was written there on two lines!
-
SevenSidedDie almost 10 years"In the good old days" is still today, with ^C and ^D being perfectly functional. The only reason that ^G doesn't make the terminal ding anymore is that most terminal emulators have that response turned off.
-
Samin yeasir almost 10 yearsAscii DEL (^?) has nothing to do with the delete key. It's actually the standard code generated by the
<---
key (also, confusingly, called backspace) on VT100-like terminals. -
Daniel R Hicks almost 10 yearsThe DEL code is significant (and is called DEL for "delete") because if you over-punch a paper tape with DEL (all ones) you erase the character.
-
dotancohen almost 10 years@DanielRHicks: I understand that you're still wearing T-shirts from the mid 70's!
-
Daniel R Hicks almost 10 years@dotancohen - Yeah, and my wife is really after me to take it off and wash it.
-
dotancohen almost 10 years@DanielRHicks: I'll get off your lawn now!
-
Abbafei almost 8 years
-
keshlam almost 8 yearsThat is correct, @abbafei. I started programming on ASR33 teletypes which had the older characters.
-
The Quark over 2 years@OrangeDog But it is not in ASCII that the caret notation (or "uparrow notation") was introduced.