Where is hex code of the "EOF" character?

83,474

Solution 1

There is no such thing as a EOF character. The operating system knows exactly how many bytes a file contains (this is stored alongside other metadata like permissions, creation date, and the name), and hence can tell programs that try to read the eleventh byte of a ten byte file: You've reached the end of file, there are no more bytes to read.

In fact, the "EOF" value returned for example by C functions like getchar is explicitly an int value outside the range of a byte, so it cannot possibly be stored in a file!

Sometimes, certain file formats insist on adding NUL terminators (probably because that's how strings are usually stored in C), though usually these delimit multiple records in a single file, not the file as a whole. And such decoration usually disqualifies a file from being considered a "text file".

ASCII codes like ETX and NUL date back to the days of teletypewriters and friends. NUL is used in C for in-memory strings, but this has no bearing on file systems.

Solution 2

There was - a long long time ago - an End Of File marker but it hasn't been used in files for many years.

You can demonstrate a distant echo of it on windows using:

C:\>copy con junk.txt
Hello
Hello again
- Press <Ctrl> and <z>
C:\>dump junk.txt
junk.txt:
00000000  4865 6c6c 6f0d 0a48 656c 6c6f 2061 6761 Hello..Hello aga
00000010  696e 0d0a                               in..
C:\>

Note the use of Ctrl-Z as an EOT marker.

However, notice also that the Ctrl-Z does not appear in the file any more - it used to appear as a 0x1a but only on some operating systems and even then not consistently.

Use of ETX (0x03) stopped even before those dim and distant times.

Solution 3

There is no such thing as EOF. EOF is just a value returned by file reading functions to tell you the file pointer reached the end of the file.

Solution 4

The EOT byte (0x04) is used to this day by unix tty terminals to indicate end of input. You type it with a Ctrl + D (ie. ^D) to end input to shells or any other program reading from stdin.

However, as others have pointed out, this is distinct from EOF, which is a condition rather than a piece of data per se.

Solution 5

There once were even different EOF characters (for different operating systems). No longer seen one. (Typically files were in blocks of 128 bytes.) For coding a PITA, like nowadays BOMs.

Instead there is still a int read() that normally delivers a byte value, but for EOF delivers -1.

The NUL character is a string terminator in C. In java you can have a NUL character in the middle of a string. To be cooperative with C, the UTF-8 bytes generated use a multi-byte encoding both for Unicode characters > 127 and for NUL.

(Some of this is probably known already.)

Share:
83,474

Related videos on Youtube

Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    As far as know in the end of all files, specially text files, there is a Hex code for EOF or NULL character. And when we want to write a program and read the contents of a text file, we send the read function until we receive that EOF hexcode.

    My question : I downloaded some tools to see a hex view of a text file. but I can't see any hex code for EOF(End Of File/NULL) or EOT(End Of Text)


    ASCII/Hex code tables :

    enter image description here

    This is output of Hex viewer tools:

    enter image description here


    Note : My input file is a text file that its content is "Where is hex code of "EOF"?"

    Appreciate your time and consideration.

    • user657267
      user657267 almost 10 years
      Your assumption in the first sentence is wrong, in the vast majority of cases there is no such character physically present in the file. EOF is a symbolic value provided by the library to notify you, the programmer, that the file end has been reached. The operating system doesn't need to know where the file ends (or rather it doesn't store this information in the file itself).
    • Admin
      Admin almost 10 years
      @user657267 I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.
    • user657267
      user657267 almost 10 years
      Unlikely. In cmd.exe ^Z is treated as the end of input so if you do something like type whatever.txt it will break when it hits ^Z if the file happens to contain one, but this only applies to the Windows command line. io libraries for programming should happily parse it as just another character.
    • mckenzm
      mckenzm about 5 years
      ^Z was common in MS-DOS text files, and still is for many transfer protocols. I expect most SO users cannot remember MS-Kermit, xmoden, ymodem etc. It is still produced by ind$file and is a chore to remove. It throws nasty messages in gedit, so yes it does exist.
    • mckenzm
      mckenzm about 5 years
      @user657267 in some cases the OS may not be reading from a file system, so it would need to know the file size in advance otherwise to know where the end occurs. Applies to stream or raw.
  • Admin
    Admin almost 10 years
    UTF-8 does not generate multiple bytes for NUL. ASCII code 0 is not special, UTF-8 is fully ASCII compatible. More relevant for C is the fact that no UTF-8 multi-byte sequence contains a 0 byte (or any byte < 128 for that matter) so NUL termination can store all Unicode code points except U+0000.
  • Joop Eggen
    Joop Eggen almost 10 years
    @delnan: The so-called Modified UTF-8 uses multi-byte encoding for NUL too, giving 0xC0, 0x80. In this way a NUL char in a C UTF-8 string may be handled.
  • Admin
    Admin almost 10 years
    But modified UTF-8 is not UTF-8. It's also quite obscure.
  • Joop Eggen
    Joop Eggen almost 10 years
    en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 mentions object serialisation. Also DataOutputStream uses this in [writeUTF}(docs.oracle.com/javase/7/docs/api/java/io/…). You are right: official UTF-8 requires the shortest multi-byte sequence: 0x00.
  • Admin
    Admin almost 10 years
    I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.
  • Admin
    Admin almost 10 years
    I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.
  • David Xu
    David Xu almost 10 years
    As long as your program is running on someone elses machine, they can always "cheat" it.
  • Admin
    Admin almost 10 years
    @User1-St Depends on how you read the file and do the search (as I said, many C functions consider NUL to signify the end of a string in memory) but there are no insurmountable difficulties.
  • Admin
    Admin almost 10 years
    How? did you mean they can give a text file to my program that have "A" in its content, and my program not notice that?
  • Admin
    Admin almost 10 years
    How I can cheat my program. let assume my program consider Null to signify the end of file. In this case, if I add a "0x00" in the middle of the hex view of my file, the program will cheated?
  • Admin
    Admin almost 10 years
    @User1-St Yes, almost by definition. That's why you should write your program not do something that silly ;-)
  • Jongware
    Jongware almost 10 years
    @User1-St: okay, this is the fourth answer I read and the fourth time you added that question. Don't do that, it's annoying and against the policy of SO. "Follow-up" questions are not meant to be asked in comments; they should be edited into your post (if relevant to the original question - this is not) or asked separately. But mostly, it is plain annoying.
  • Admin
    Admin almost 10 years
    :D So let write a program not do something that silly :)) thank you.
  • David Xu
    David Xu almost 10 years
    if your program is running on someone elses machine and they REALLY want to cheat it, they can, even with a debugger like OllyDbg or by hooking API functions, etc, theres lots of ways to cheat programs.
  • Admin
    Admin almost 10 years
    I want to know is there any any way to cheat the program by only changing the text file? Assume that they can't install or edit anything in the host (that my program installed in it.)
  • David Xu
    David Xu almost 10 years
    If you wrote your program correctly, then no, they can't "cheat" it
  • Admin
    Admin almost 10 years
    Sprry, Is this right or not? "the program keep reading the text file until receive a special hex code" that the special hex code depends on the programin language that I use.
  • Admin
    Admin almost 10 years
    No! When the read function return FEOF or 1. How the program understand that a point is the end of a file?
  • Maarten Bodewes
    Maarten Bodewes almost 10 years
    If your runtime makes a distinction between text and binary mode, and you are expecting control chararcters (< 20h), make sure you open in binary mode, just to be sure. You can convert to text afterwards.
  • Admin
    Admin almost 10 years
    @delnan Where operation system save metadata file? Can I find it in the hard disk?
  • Admin
    Admin almost 10 years
    @User1-St The metadata is stored somewhere on the hard disk (where and how depends a lot on the filesystem) but it's not a file itself! The metadata can usually be accessed through other APIs (for example stat on Unix-y systems).
  • Admin
    Admin almost 10 years
    @owlstead , would you please explain more clear? I don't understand your comment! thank you
  • Admin
    Admin almost 10 years
    @delnan is it possible to make change in it or it is protected? do you know how to access it in windows? What APIs? Thank you agian, very much!! :)
  • Admin
    Admin almost 10 years
    @User1-St I fear explaining it all goes beyond the scope of these comments. Sit down, read around a bit (stat, the organization of a simple file system like FAT), think hard and try to come up with one or a couple of questions that you can ask separately on Stack Overflow.
  • Maarten Bodewes
    Maarten Bodewes almost 10 years
    @delnan If the text mode is responding to control characters, you may never see them back, it could well be that it stops reading after a 00h character. You either should know how the runtime behaves or you should open in binary mode.
  • Admin
    Admin almost 10 years
    @delnan , OK dear friend. Thank you :)
  • Admin
    Admin almost 10 years
    @owlstead Open what file in binary mode? There is no 00h in the end of text file.