Why are special characters such as "carriage return" represented as "^M"?

14,066

Solution 1

I believe that what OP was actually asking about is called Caret Notation.

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code

The full list of ASCII control characters along with caret notation can be found here

Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.

Solution 2

That is exactly the reason.

ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):

   Oct   Dec   Hex   Char                       
   ─────────────────────────────────────────────
   000   0     00    NUL '\0'                    
   001   1     01    SOH (start of heading)     
   002   2     02    STX (start of text)         
   003   3     03    ETX (end of text)           
   004   4     04    EOT (end of transmission)   
   005   5     05    ENQ (enquiry)               
   006   6     06    ACK (acknowledge)           
   007   7     07    BEL '\a' (bell)             
   010   8     08    BS  '\b' (backspace)       
   011   9     09    HT  '\t' (horizontal tab)  
   012   10    0A    LF  '\n' (new line)        
   013   11    0B    VT  '\v' (vertical tab)    
   014   12    0C    FF  '\f' (form feed)       
   015   13    0D    CR  '\r' (carriage ret)    

Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.

The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.

Solution 3

The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.

There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.

Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.

Teletype Model 33 ASR with paper tape punch/reader

Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons

The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.

Solution 4

The caret (^) is just shorthand for writing hold the Control key - CTRL down.

In the good old days you could type these codes (see above) in directly, Ctrl key + G (^G) would make the terminal go "ding"

When you want to add a CR in Vim you use Ctrl key + M etc tab = Ctrl + I

Solution 5

The need for some visual manner of displaying what are by definition non-printable characters.

So, someone in the early 1970s (or maybe earlier) (I remember seeing it on CP/M, and someone else has already mentioned TOPS) decided that "caret plus letter" would be the symbol for the 26 unprintable ASCII control characters with values 1 thru 26. Value 0 is/was printed as ^@, and value 127 as ^?.

Share:
14,066

Related videos on Youtube

dotancohen
Author by

dotancohen

Updated on September 18, 2022

Comments

  • dotancohen
    dotancohen over 1 year

    Why is ^M used to represent a carriage return in VIM and other contexts?

    My guess is that M is the 13th letter of the Latin alphabet and a carriage return is \x0D or decimal 13. Is this the reason? Is this representation documented anywhere?

    I notice that Tab is represented by ^I, which is the ninth letter of the Latin alphabet. Conversely, Tab is \x09 or decimal 9, which supports my theory stated above. However, where might this be documented as fact?

    • Admin
      Admin almost 10 years
      Also keep in mind that dos/windows use "0x0d 0x0a", also noted as "CR LF". But unix/linux use only "0x0a" or "LF". So when you open a windows document in linux it detects extra "CR", and when you open a linux document in windows it doesn't detect new lines.
    • Admin
      Admin almost 10 years
      @LatinSuD caret notation (and corresponding use of the Ctrl-key) relates to the C0 control set (historically part of ASCII) directly and not whether and how a given operating system or program uses part of that set in representing new lines, or anything else. Similarly, whether ^H deletes a character or allows overprinting (such as n^H~ as an obsolete way to produce ñ) or any other actual use of the control character is separate from the caret notation.
    • Admin
      Admin almost 10 years
      old one ... I can't remember the original code, but ctrl-G rings a bell!
    • Admin
      Admin almost 10 years
      the ^M you see when in linux (which uses "0x0a"(LF)) is probably from a file made on windows (which uses "0x0d 0x0a" (CR LF)). Thus, at the end of each line, you see the extra "0x0d" (CR). (the 0x0a being interpreted as a newline, and not shown in vi (well, it is : the next line will have a "~" if the previous line didn't end with a Newline). So the the ^M is not exactly a "carriage return", it's part of what a carriage return is in windows. The Answer tells why it's represented that way (using Caret Notation, ^@ = 0x00, ^A=0x01, etc..., ^M=0x0d, ...)
    • Admin
      Admin almost 10 years
      @OlivierDulac no, the ^M is exactly a carriage return, just like ^J is exactly a line-feed. While different OSs have had different views as to whether line-feed and/or carriage return or something else (like the Newline character used by some IBM characters but not part of ASCII and so not part of the historical heritage of some other OSs) should represent a new line in a text file, and while some programs have then overridden that in different ways, U+000D itself is still a carriage return, whatever later operating systems like Unix or DOS decided to do with it. (Of course, calling it...
    • Admin
      Admin almost 10 years
      @OlivierDulac ... U+000D is proleptic, since that name came with Unicode in the 1990s, but that does quite definitely reference the code as it existed in ASCII in 1963, anf through that as it existed in Murray's modified Baudot code in 1901. Murray was solving problems related to moving paper around, with the same tools used in the concept of "text file" many decades later. Hammer a screw into something like a nail, and it's still a screw. Use LF and/or CR to represent the end of a line in a text file, and they're still line-feeds and carriage returns.
    • Admin
      Admin almost 10 years
      @JonHanna: apologies, i mixed in my comment carriage return and newlines.
    • Admin
      Admin almost 10 years
      Because Control-M was the ASR-33 TTY keyboard combination to get the character. (And yes, Brian, Ctrl-G does ring a bell.)
    • Admin
      Admin almost 10 years
      Has nothing to do with "letter of the alphabet", other than when the ASCII table was laid out the alpha characters were assigned sequentially, starting from 0x41.
    • Admin
      Admin almost 10 years
      I knew you could actually use ctrl+i as tab (I use it on connectbot on my phone in vim) I didn't realize that ^M works the same way, and they work basically everywhere. Cool!
  • user
    user almost 10 years
    Whilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference. That way, should the linked page ever change or become invalid for any reason, the answer will still be useful to visitors to Super User.
  • dotancohen
    dotancohen almost 10 years
    Thank you. Though informative, this answer does not contain the answer to the question.
  • dotancohen
    dotancohen almost 10 years
    Thank you. Though informative, this answer does not contain the answer to the question.
  • dotancohen
    dotancohen almost 10 years
    Perfect, thank you. This is exactly what I was looking for.
  • Deliss
    Deliss almost 10 years
    I always wondered what that thing was called...
  • keshlam
    keshlam almost 10 years
    This convention goes back at least to the 1970's; I first saw it on the TOPS-10 operating system but it may well have existed earlier. For what it's worth, on older ASCII terminals the character now shown as a caret was actually an upward-pointing arrow, so this originated as "uparrow notation".
  • barlop
    barlop almost 10 years
    Most of the control characters are meaningless, but even some of those with meaning like Ctrl-I i'm not sure where you can just do Ctrl-I and get a tab.
  • OrangeDog
    OrangeDog almost 10 years
    This is explictly built into the ASCII design so that the Ctrl key just toggles bit 7.
  • Jon Hanna
    Jon Hanna almost 10 years
    none of the control characters are meaningless. Many of them are unused in many contexts, but every single one has at least one meaning.
  • barlop
    barlop almost 10 years
    @JonHanna Of course I don't mean they were meaningless(past tense).But R.Have been meaningless for decades i.e. they had their original meanings from eons ago,tech that no longer runs, are (most of the chars) meaningless today w/ current and even slightly old tech.n if any are being put to modern uses it's not many. There's a list here en.wikipedia.org/wiki/Control_character of ones in common use 0,7,8,9,10,11,12,13,127. 9/33 so the others (24 of them) u would either c very rarely or not at all as they r as dead as the antic unused out of use for decades machinery they were used on
  • Jon Hanna
    Jon Hanna almost 10 years
    Associated Press still use ANPA-1312 which uses 1–4, 6 & 16 are used to start every TCP/IP connection. Modern printers (among other thigns) still use 17 & 19. Together with those you mention, we've quite a percentage of them covered without really trying. I'll grant you they aren't in heavy use, but they ain't dead either.
  • wchargin
    wchargin almost 10 years
    @barlop You can do ^I for a tab in standard bash: type ls ~/^I^I and you should see all the folders in your home directory.
  • pmms
    pmms almost 10 years
    The answer is hidden in the second paragraph: ^M is shorthand for Control-M. On the terminal you would press the Control key together with the M key to send the ASCII kode 0x0D also known as a carriage return.
  • Samin yeasir
    Samin yeasir almost 10 years
    It's not used only with letters. I would not define it as the control character with "the letter's numeric value" but rather as "xor 64". In other words, ^A is 0x41 xor 0x40, or 0x01 and ^? is 0x3F xor 0x40, or 0x7F.
  • rossmcm
    rossmcm almost 10 years
    It's also not used just with ASCII characters anymore. Windows for example allows you to detect and act on Ctrl-Del (hold Ctrl down and press the Del key). The Del key (or Delete)has no ASCII value, yet we sometimes see it written as ^Del.
  • barlop
    barlop almost 10 years
    @JonHanna In the case of TCP, it uses SYN and ACK but not with those ascii codes of SYN-0x16(^V) and ACK-0x6(^F). TCP doesn't use that ASCII, it uses a single bit for SYN 0x002 and a single bit for ACK 0x010 And so any values with those bit set would indicate SYN and/or ACK. As for Printers DC1,DC3 and Associated Press and the AP-1312 that is an interesting case I see mentioned here too en.wikipedia.org/wiki/C0_and_C1_control_codes I suppose that counts but I wonder to what extent they are control characters if you can't make them with Ctrl - Maybe back in the day you could?
  • Scott
    Scott almost 10 years
    The term you are looking for is digraph, which means two characters that represent one character. Specifically, digraphs and trigraphs are used to represent nonprintable characters. Historically they have also been used for characters that do not appear on a keyboard, although with modern GUIs and keyboards this is less of an issue so this use is more archaic.
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    @rossmcm - Actually, ASCII 0x7F is "DEL". Or course, what Windows regards as a valid key combo likely has no relation to reality.
  • Stuart Golodetz
    Stuart Golodetz almost 10 years
    On some level the extent to which we are still bound by design choices made for what now seem like ancient systems is quite surprising - I guess on reflection that (a) it's not that long ago, it's just that the pace of change in the interim has been astonishing, and (b) if enough design decisions are made, some of them (especially the ones that don't cause people enough problems) are bound to stick around long after the reasons for them disappear into memory. Still an odd feeling to look back at the history of some of these things though.
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    @StuartGolodetz - Actually, I find it strangely reassuring. But then I remember when Teletypes were "advanced technology". (The Teletype ASR-33, by the way, was remarkable for it's elegant simplicity. I only wish that "modern" computer systems were as well-designed.)
  • CaptainCodeman
    CaptainCodeman almost 10 years
    This is fascinating but what I don't understand is.. why of all things did they decide this typewriter needed a bell?
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    @CaptainCodeman - When you transmitted an important message you'd ring the bell to get the attention of the operator on the other end.
  • Stuart Golodetz
    Stuart Golodetz almost 10 years
    @DanielRHicks - I guess the thought it makes me have is that perhaps the gap between what we consider "modern" and "ancient" technology isn't nearly as large as one might think it is. Indeed, much supposedly modern technology incorporates things with very old roots, although each generation thinks they're doing everything from scratch. Those young'n's :)
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    It is interesting to note that the Ctrl key survives to this day on PC keyboards.
  • dotancohen
    dotancohen almost 10 years
    I don't see a dedicated "RETURN" key, but I do see a LineFeed key. Is that what you mean?
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    @dotancohen - Second row, far right, next to LINE FEED.
  • dotancohen
    dotancohen almost 10 years
    Thanks, I did not even recognize what was written there on two lines!
  • SevenSidedDie
    SevenSidedDie almost 10 years
    "In the good old days" is still today, with ^C and ^D being perfectly functional. The only reason that ^G doesn't make the terminal ding anymore is that most terminal emulators have that response turned off.
  • Samin yeasir
    Samin yeasir almost 10 years
    Ascii DEL (^?) has nothing to do with the delete key. It's actually the standard code generated by the <--- key (also, confusingly, called backspace) on VT100-like terminals.
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    The DEL code is significant (and is called DEL for "delete") because if you over-punch a paper tape with DEL (all ones) you erase the character.
  • dotancohen
    dotancohen almost 10 years
    @DanielRHicks: I understand that you're still wearing T-shirts from the mid 70's!
  • Daniel R Hicks
    Daniel R Hicks almost 10 years
    @dotancohen - Yeah, and my wife is really after me to take it off and wash it.
  • dotancohen
    dotancohen almost 10 years
    @DanielRHicks: I'll get off your lawn now!
  • Abbafei
    Abbafei almost 8 years
    @keshlam It turns out that the uparrow was actually part of ASCII itself :-) The caret replaced the uparrow (and the underscore replaced the leftarrow) later on. Found this out here via Wikipedia.
  • keshlam
    keshlam almost 8 years
    That is correct, @abbafei. I started programming on ASR33 teletypes which had the older characters.
  • The Quark
    The Quark over 2 years
    @OrangeDog But it is not in ASCII that the caret notation (or "uparrow notation") was introduced.