Do file extensions have any purpose in Linux?

29,849

Solution 1

Linux determines the type of a file via a code in the file header. It doesn't depend on file extensions for to know with software is to use for opening the file.

That's what I remember from my education. Please correct me in case I'm wrong!

  • correctly remembered.

Are these extensions are meant only for humans?

  • Yes, with a but.

When you interact with other operating systems that do depend on extensions being what they are it is the smarter idea to use those.

In Windows, opening software is attached to the extensions.

Opening a text file named "file" is harder in Windows than opening the same file named "file.txt" (you will need to switch the file open dialog from *.txt to *.* every time). The same goes for TAB and semi-colon separated text files. The same goes for importing and exporting e-mails (.mbox extension).

In particular when you code software. Opening a file named "software1" that is an HTML file and "software2" that is a JavaScript file becomes more difficult compared to "software.html" and "software.js".


If there is a system in place in Linux where file extensions are important, I would call that a bug. When software depends on file extensions, that is exploitable. We use an interpreter directive to identify what a file is ("the first two bytes in a file can be the characters "#!", which constitute a magic number (hexadecimal 23 and 21, the ASCII values of "#" and "!") often referred to as shebang,").

The most famous problem with file extensions was LOVE-LETTER-FOR-YOU.TXT.vbs on Windows. This is a visual basic script being shown in file explorer as a text file.

In Ubuntu when you start a file from Nautilus you get a warning what it is going to do. Executing a script from Nautilus where it wants to start some software where it is supposed to open gEdit is obvious a problem and we get a warning about it.

In command line when you execute something, you can visually see what the extension is. If it ends on .vbs I would start to become suspicious (not that .vbs is executable on Linux. At least not without some more effort ;) ).

Solution 2

There is no 100% black or white answer here.

Usually Linux does not rely on file names (and file extensions i.e. the part of the file name after the normally last period) and instead determines the file type by examining the first few bytes of its content and comparing that to a list of known magic numbers.

For example all Bitmap image files (usually with name extension .bmp) must start with the letters BM in their first two bytes. Scripts in most scripting languages like Bash, Python, Perl, AWK, etc. (basically everything that treats lines starting with # as comment) may contain a shebang like #!/bin/bash as first line. This special comment tells the system with which application to open the file.

So normally the operating system relies on the file content and not its name to determine the file type, but stating that file extensions are never needed on Linux is only half of the truth.


Applications may of course implement their file checks however they want, which includes verifying the file name and extension. An example is the Eye of Gnome (eog, standard picture viewer) which determines the image format by the file extension and throws an error if it does not match the content. Whether this is a bug or a feature can be discussed...

However, even some parts of the operating system rely on file name extensions, e.g. when parsing your software sources files in /etc/apt/sources.list.d/ - only files with the *.list extension get parsed all others are ignored. It's maybe not mainly used to determine the file type here but rather to enable/disable parsing of some files, but it's still a file extension that affects how the system treats a file.

And of course the human user profits most from file extensions as that makes the type of a file obvious and also allows multiple files with the same base name and different extensions like site.html, site.php, site.js, site.css etc. The disadvantage is of course that file extension and the actual file type/content do not necessarily have to match.

Additionally it's needed for cross-platform interoperability, as e.g. Windows will not know what to do with a readme file, but only a readme.txt.

Solution 3

I'd like to take a different approach to this from other answers, and challenge the notion that "Linux" or "Windows" have anything to do with this (bear with me).

The concept of a file extension can be simply expressed as "a convention for identifying the type of a file based on part of its name". The other common conventions for identifying the type of a file are comparing its contents against a database of known signatures (the "magic number" approach), and storing it as an extra attribute on the file system (the approach used in the original MacOS).

Since every file on a Windows or a Linux system has both a name and contents, processes which want to know the file type can use either the "extension" or the "magic number" approaches as they see fit. The metadata approach is not generally available, as there is no standard place for this attribute on most file systems.

On Windows, there is a strong tradition of using the file extension as the primary means of identifying a file; most visibly, the graphical file browser (File Manager on Windows 3.1 and Explorer on modern Windows) uses it when you double-click on a file to determine which application to launch. On Linux (and, more generally, Unix-based systems), there is more tradition for inspecting the contents; most notably, the kernel looks at the beginning of a file executed directly to determine how to run it; script files can indicate an interpreter to use by starting with #! followed by the path to the interpreter.

These traditions influence UI design of programs written for each system, but there are plenty of exceptions, because each approach has pros and cons in different situations. Reasons to use file extensions rather than examining contents include:

  • examining file contents is fairly costly compared to examining file names; so for instance "find all files named *.conf" will be a lot quicker than "find all files whose first line matches this signature"
  • file contents can be ambiguous; many file formats are actually just text files treated in a special way, many others are specially-structured zip files, and defining accurate signatures for these can be tricky
  • a file can genuinely be valid as more than one type; an HTML file may also be valid XML, a zip file and a GIF concatenated together remain valid for both formats
  • magic number matching might lead to false positives; a file format that has no header might happen to begin with the bytes "GIF89a" and be misidentified as a GIF image
  • renaming a file can be a convenient way to mark it as "disabled"; e.g. changing "foo.conf" to "foo.conf~" to indicate a backup is easier than editing the file to comment out all of its directives, and more convenient than moving it out of an autoloaded directory; similarly, renaming a .php file to .txt will tell Apache to serve its source as plain text, rather than passing it to the PHP engine

Examples of Linux programs which use file names by default (but may have other modes):

  • gzip and gunzip have special handling of any file ending ".gz"
  • gcc will handle ".c" files as C, and ".cc" or ".C" as C++

Solution 4

As mentioned by others, in Linux an interpreter directive method is used (storing some metadata in a file as a header or magic number so the correct interpreter can be told to read it) rather than the filename extension association method used by Windows.

This means you can create a file with almost any name you like... with a few exceptions

However

I would like to add a word of caution.

If you have some files on your system from a system that uses filename association, the files may not have those magic numbers or headers. Filename extensions are used to identify these files by applications that are able to read them, and you may experience some unexpected effects if you rename such files. For example:

If you rename a file My Novel.doc to My-Novel, Libreoffice will still be able to open it, but it will open as 'Untitled' and you will have to name it again in order to save it (Libreoffice adds an extension by default, so you would then have two files My-Novel and My-Novel.odt, which could be annoying)

More seriously, if you rename a file My Spreadsheet.xlsx to My-Spreadsheet, then try to open it with xdg-open My-Spreadsheet you will get this (because it's actually a compressed file):

And if you rename a file My Spreadsheet.xls to My-Spreadsheet, when you xdg-open My-Spreadsheet you get an error saying

error opening location: No application is registered as handling this file

(Although in both these cases it works OK if you do soffice My-Spreadsheet)

If you then rename the extensionless file to My-Spreadsheet.ods with mv and try to open it you will get this:

(repair fails)

And you will have to put the original extension back on to open the file correctly (you can then convert the format if you wish)

TL;DR:

If you have non-native files with name extensions, don't remove the extensions assuming everything will be OK!

Solution 5

Actually, some technologies do rely on file extensions, so if you use those technologies in Ubuntu, you'll have to rely on extensions too. A few examples:

  • gcc uses extensions to distinguish between C an C++ files. Without the extension it's pretty much impossible to differentiate them (imagine a C++ file with no classes).
  • many files (docx, jar, apk) are just particularly structured ZIP archives. While you can usually infer the type from the content, it may not always be possible (e.g. Java Manifest is optional in jar files).

Not using file extensions in such cases will only be possible with hacky workarounds and is likely to be very error-prone.

Share:
29,849

Related videos on Youtube

mizech
Author by

mizech

Updated on September 18, 2022

Comments

  • mizech
    mizech over 1 year

    Linux determines a file's type via code in the file's header. This process doesn't depend on file extensions to know which software to use for opening the file.

    (That's what I remember from my education. Please correct me in case I'm wrong!)

    Working a bit with Ubuntu systems recently: I see a lot of files on the systems which have extensions like .sh, .txt, .o, .c.

    Now I'm wondering: Is the purpose of these extensions to merely help people understand what sort of file they happen to be looking at? Or do they have some purpose for the operating system also?

    • Admin
      Admin almost 8 years
      If you don't get a good response here, remember there is also unix.stackexchange.com
    • Admin
      Admin almost 8 years
      Related, almost duplicate: askubuntu.com/questions/390015/…
    • Admin
      Admin almost 8 years
    • Admin
      Admin almost 8 years
      In Windows they do, in Linux/Unix they mostly don't. The main exception are the compression-programs - gzip, bzip2, xz - and so on. These programs uses suffixes to separate the compressed version of a file from the uncompressed one they replace. Compression-programs will often complain about incorrect suffix, even though the file actually is a compressed file of the type it should handle.
    • Admin
      Admin almost 8 years
      I think part of the problem with this question is that "the operating-system" is not a well-defined concept. What is part of the operating system, and what is an application on top of it? Not many parts of the OS (whichever OS we're talking about) care what type a file is - they just do what they're told. So distinctions about how they know are irrelevant; they do neither. Applcations, on the other hand, may well do one or both things.
    • Admin
      Admin almost 8 years
      Answers have covered this well, but I'd add that some applications use file extension
    • Admin
      Admin almost 8 years
      E.g. text editors for syntax highlighting
    • Admin
      Admin almost 8 years
      The oldest use of file extensions I'm aware of was by the C compiler; very old C compilers used to work by foo.c (source code) -> foo.s (assembler) -> foo.o (separately compiled output) -> foo (linked output). In this case, the file extension was necessary for different formats to have different names, which caters to the kernel-level requirement that a given file name be associated to only one stream of bytes.
    • Admin
      Admin almost 8 years
      @jcast, it was/is foo.c compiles to foo.a, then assembles foo.a to foo. But, technically, all things are possible, depending on the 'makefile'.
    • Admin
      Admin over 5 years
  • techraf
    techraf almost 8 years
    I completely don't get what you wanted to say in your last sentence. First, it is a problem of hiding the extension rather than having it, second the exploit would work the same in Linux - you name a binary file readme.txt and make it executable. If user executed it, it does not open the editor, but runs the code. In this respect making extensions matter (but not hiding them) is more secure and easier to explain for non-savvy users. There are other differences (most notably not executing files from the current directory), but they have nothing to do with extensions.
  • Byte Commander
    Byte Commander almost 8 years
    A new-style MS Office document (docx, xlsx, pptx etc) without file extension opens in the archive manager because those file types are actually just ordinary ZIP compressed files which contain all the XML documents and media files necessary to define the document content. The file format of a ZIP compressed directory is pretty common nowadays btw.
  • Ray
    Ray almost 8 years
    Already many great answers, but just one more specific to libreoffice that I've noticed. You create a file of comma separated values (CSV) and save it as "test.csv", a window will open asking what type of separator are you using (i.e., libreoffice Calc). If you rename this file to "test.cs", for example, then libreoffice's Writer opens it. So, besides the ZIP example above, it does seem like libreoffice does make use of the file extension.
  • terdon
    terdon almost 8 years
    This isn't quite true. There are programs that expect a specific extension. The most commonly used example is probably gunzip which won't decompress a file if it isn't called foo.gz.
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 8 years
    That's an implementation of specific software. For the most part, utilities on unix-like systems don't expect an extension.
  • terdon
    terdon almost 8 years
    For the most part they don't, no. Your first sentence, however, claims that they are never used and only matter to humans. That isn't entirely true. gunzip is one example, eog is another. Also, many tools won't autocomplete names without the right extension. All I'm saying is that it's a bit more complicated than "extensions are always irrelevant".
  • Byte Commander
    Byte Commander almost 8 years
    Windows is also not bound to the x.3 naming scheme any more, you have got longer extensions there as well like .doxc, .torrent, .part, etc. It's just that many file formats and extensions were already defined back in the time when 8.3 naming was still a thing and later formats mostly simply adapted the convention of using up to 3 letters.
  • Bakuriu
    Bakuriu almost 8 years
    @techraf Actually the file manager will probably try to open the readme.txt file with a text editor. I just tried with dolphin in KDE, creating a shell script adding executable permission, saving it as .txt and clicking on it will make it open in Kate. If I rename it to .sh then clicking on it runs it.
  • techraf
    techraf almost 8 years
    File manager probably will, I wasn't referring to file managers (also indicated by "current directory" remark). In fact I intended to ask the author about the meaning of the last sentence (i admit I did it indirectly). I am not sure why you included my nick. Your comment reads rather like questioning the validity of some claims in the answer itself (looks like it's calling the file manager a bug).
  • bolov
    bolov almost 8 years
    linux: since make is build around rules that depend on the file extension, wouldn't this make (no pun intended) the extensions meant for more than just humans?
  • Rinzwind
    Rinzwind almost 8 years
    I would call depending on extensions a bug. What should be done (and yes I know it will cost more processor time) is that the command "file" should be called and examined to check for the magic number. @techraf "If user executed it, it does not open the editor, but runs the code" is a USER problem not a system security problem. When we see a README.TXT we use "more", "gedit", "vim" or "nano" to view it. We do NOT execute a readme. That's a windows mentality we do not need in Linux.
  • Rinzwind
    Rinzwind almost 8 years
    @techraf and you assume too much."looks like it's calling the file manager a bug" No I do not. What Nautilus does is correct. If you open a README.TXT it scans the magic number and offers a suitable problem to open it with as a warning. If it does not offer to open it in gEdit and you still do YOU are the problem, not Nautilus. It did its job: warn you. Again: file extensions should not be anything else than for a user to visible see what it should be.
  • Peter Green
    Peter Green almost 8 years
    The linux filesystem doesn't do anything regarding file types. That is all down to the programs running on top of it.
  • Barb Hammond
    Barb Hammond almost 8 years
    @PeterGreen Yes, but the fact that the programs do assign it significance means it's not "just for humans" the way, e.g., classic MacOS had it [there were four-byte "file type" and "creator app" fields that weren't part of the file name, so the OS and applications had all the information they needed without looking at file extensions]
  • IMSoP
    IMSoP almost 8 years
    I don't see how ".conf", ".c", etc, are "a different meaning" from "the 8.3 sense". The concept of a file extension can be simply expressed as "a convention for identifying the type of a file based on part of its name". Not even DOS / Win3.1 required the correct extension (you could call a Word document "STUPIDN.AME" and open it with Ctrl-O in WinWord). It's just that some systems (e.g. double-click on Windows, gzip, your Makefile, etc) may be written to use this convention to make assumptions about the correct action to take on each file.
  • IMSoP
    IMSoP almost 8 years
    As others have pointed out, certain file types are very hard to define by their contents, such as the many formats based on zip archives (JAR, ODF, OOXML, etc). It's also possible for a file to contain data valid in two contexts (e.g. you can concatenate a zip archive and a GIF, and the file is valid in both formats). As such, allowing the user to provide extra information in the form of a naming convention can improve the UX. It is no more secure to call file on ILOVEYOU and decide it should be run through a vulnerable interpreter anyway - the attacker determines both name and content.
  • IMSoP
    IMSoP almost 8 years
    Also, "In Linux when you start a file from Nautilus..." should really read "In Nautilus, when you start a file...", or maybe "...double-click a file...". It has absolutely nothing to do with Linux as a kernel or an overall operating system, but is just a UI decision made by that particular application. A Windows port of Nautilus could make exactly the same decision.
  • IMSoP
    IMSoP almost 8 years
    @PeterGreen The Windows filesystem doesn't do anything regarding file types either. The graphical shell (Windows Explorer) uses file extension to choose an action for double-click, but technically that's just a program running on top of the OS, just as Nautilus is. It would be perfectly possible to write a Linux file manager with that behaviour, or a Windows one which examined the file contents.
  • Rinzwind
    Rinzwind almost 8 years
    1 small issue: OP asked about the operating system. 'gunzip' and 'eog' are not the operating system but decided to create their own restrictions (in case of gunzip) or method (eog). "mime types" though.
  • Rinzwind
    Rinzwind almost 8 years
    @IMSoP correct. Changed it to Ubuntu :) (other systems might have changed Nautilus to fit their needs)
  • IMSoP
    IMSoP almost 8 years
    @Rinzwind What is and isn't "the operating system" is a matter of opinion / debate (and sometimes law suit!). There is nothing in the Linux kernel that ever needs to know if a file is a bitmap, whether by filename or contents.
  • IMSoP
    IMSoP almost 8 years
    You slightly contradict yourself here: if the standard image viewer requires a filename ending .bmp, what part of the OS are you saying relies on the file content starting "BM"? AFAIK, the only "magic numbers the kernel cares about are executable types, including the special case of #!. Everything else is up to some application's decision.
  • Barb Hammond
    Barb Hammond almost 8 years
    A magic number is not a fixed size field.
  • Byte Commander
    Byte Commander almost 8 years
    @IMSoP I don't know the exact implementation of eog and I don't know why they care about the file name at all. This is a bug in my opinion. And of course if the file is named "bmp" but its content format does not match, there will be an error as well, of course. Of course each application decides how to verify files, but in general Linux applications should not rely on the name. Btw, you can use the file commend to examine file types by their content.
  • coteyr
    coteyr almost 8 years
    @ByteCommander That's true, but the extension still determines the app used. I am not sure how to edit the answer to reflect that.
  • coteyr
    coteyr almost 8 years
    @IMSoP you are correct that if you open a program you can open any file in that program. However the OS will see STUPID.AME as a ACT! email system library. For example you can't launch a program unless it ends in .exe, .com, .bat or maybe .dll and if you rename foo.exe as foo.txt then ask the system what kind of file foo.txt is, it will tell you a text file.
  • Thomas
    Thomas almost 8 years
    @IMSoP not just "very hard", but sometimes impossible. And if file doesn't know how to recognize one of the more exotic file formats, you better hope it has an extension attached to it you can search for, otherwise good luck using that file.
  • IMSoP
    IMSoP almost 8 years
    @coteyr Again, it all depends what we mean by "the OS". The File Manager will certainly look up a registry key for "AME", and will tell me that "foo.txt" is a text file. But running dir at a command prompt will tell me no such thing; it simply won't care. Executing files is certainly an exception, on both OSes; if the question was limited to those, the answer would be that DOS/Windows only care about the name, and Unix/Linux only care about the execute permission and the first bytes of the file. Outside of that, there is always some application choosing a convention to follow.
  • IMSoP
    IMSoP almost 8 years
    @Random832 In the context of the Unix kernel, it was historically the first two bytes of the file (so 16 bits, not 32) which identified the executable format. Note that this include the two bytes #!, the handler for which happens to start by reading in the following bytes up to a new line before deciding how to proceed. I came upon this article which goes into some detail about the current Linux implementation, which doesn't have an explicit limit of 2 byte magic numbers (it always pre-loads 128 bytes of the file).
  • IMSoP
    IMSoP almost 8 years
    The sentence I am challenging is this: "Linux ... determines the file type by examining the first few bytes". What definition of "Linux" are you using in that sentence? The existence of the file utility doesn't really prove anything; it's a useful tool, that could exist on any OS. What fundamental part of the OS makes running file any more "correct" than globbing the file name?
  • techraf
    techraf almost 8 years
    @Rinzwind I asked (implicitly) about the meaning of one particular sentence which you included in your answer. Explicitly: why do you attribute the problem with ILOVEYOU worm to the existence of file extensions. ILOVEYOU tricked user into executing a file. In your reply you explicitly state that if user is tricked it is a "user problem not a system security problem". Given that, I understand your intention even less. If someone takes time to read your answer, thinks about it and asks (maybe not clearly enough) a specific question about one sentence, kindly please reciprocate.
  • isanae
    isanae almost 8 years
    Note that files without an extension can be associated with a program.
  • Monty Harder
    Monty Harder almost 8 years
    @terdon I can get gunzip to decompress any file I damn please, simply by not letting it know the name of the file gunzip <foo it happily decompresses the file.
  • Monty Harder
    Monty Harder almost 8 years
    Windows also has a strong tradition of hiding the extension if it's "well known" and even DOS allowed a command to omit .COM, .BAT, and .EXE, automatically searching for those to determine what actual program to execute. There is no such tradition in *nix.
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 8 years
    @IMSoP Clearly operating system isn't a very debatable topic - it is composed out of kernel and several basic services. gunzip and eog are software that in no way influences how the system runs and the system clearly can live without them just fine. They are also not responsible for loading other programs , like linux kernel or program loader are .
  • Alan Shutko
    Alan Shutko almost 8 years
    This is a monumentally wrong answer. Some parts of Linux use magic numbers to determine file types. Executing files at the command line. But other huge parts of the system use file extensions to know what to look at, whether those be the dynamic linker (which wants .so files), modprobe, build systems, plugins, libraries for python, ruby, etc. Many files don't have magic numbers, file is heuristic-based, not definite.
  • IMSoP
    IMSoP almost 8 years
    @Serg Sure, you can define OS narrowly, and get a trivial answer to the question. It's not a particularly helpful answer, though, because the vast majority of what a user does with a computer involves software you've excluded. Note that the question contrasted "only for humans" against "the operating-system"; I don't think they meant "the kernel".
  • Jonathan Cast
    Jonathan Cast almost 8 years
    "If there is a system in place in Linux where file extensions are important, I would call that a bug" --- hmm, so the historical practice of cc is a bug? I believe the C compiler has always depended on the file extension to distinguish between .c arguments that need to be handed to cc1 and .o arguments that can be handed to ld directly.
  • hvd
    hvd almost 8 years
    @BenVoigt When compiling a file with a .cc extension with gcc, it really will be compiled as C++, and this is documented in man gcc: "For any given input file, the file name suffix determines what kind of compilation is done:" followed by a list of extensions and how they are treated.
  • user
    user almost 8 years
    @coteyr You forgot *.scr (screen saver binary) in Windows 3.1 and up. That said, the file extension even in DOS/Windows systems even for executables is still just a convenience. The specifics depend very much on where you draw the line of "operating system", but you can always load a binary into memory and jump into it yourself, doing the work that one normally asks of the OS to do. In MS-DOS, if you look through command.com, I'm pretty sure there's a list like EXE COM that you can edit such that it looks for other extensions if none is specified (not saying it'd be a good idea, mind you).
  • Ben Voigt
    Ben Voigt almost 8 years
    @hvd Then maybe it's the default set of libraries that goes horribly wrong if you don't use the right frontend. Anyway make is the prime example because everything it does is based on file extension.
  • leonbloy
    leonbloy almost 8 years
    "Linux determines the type of a file via a code in the file header" "correct" WTF? What "code in the file header" ? There is no such code, and there is no such a generic "file header" in Linux.
  • ave
    ave almost 8 years
    Checking headers in every file for example when searching for a file would take a lot, especially on slow drives.
  • Edward Torvalds
    Edward Torvalds almost 8 years
    Linux distributions too need extension to determine file type and they do use it, try this: cp /bin/ls ~/readme.txt and then launch nautilus and the open this new file, it will be opened in text editor. try to compile a C program code without .c extension you will get error, you still think gcc has bug?
  • Edward Torvalds
    Edward Torvalds almost 8 years
    it is easy to guess file type from extension than to guess it from first few bytes, so I think it is pretty obvious file extensions are for OS too.
  • DocSalvager
    DocSalvager almost 8 years
    This is a much better answer but has one factual error... a script cannot be made executable by placing #! at the beginning. Any file with its executable bit(s) set can executed in one of several ways. #!/bin/bash and similar signatures just specify which interpreter to use. If no such signature is supplied, the default shell interpreter is assumed. A file containing nothing but the two words 'Hello World', but with its execution bit set, will attempt to find a 'Hello' command when run.
  • IMSoP
    IMSoP almost 8 years
    @DocSalvager Good catch, that was clumsy wording as much as anything. I've reworded it a bit to make clear that the shebang doesn't make the script executable, it just changes how it is executed.
  • Old Badman Grey
    Old Badman Grey over 7 years
    "If there is a system in place in Linux where file extensions are important, I would call that a bug": ipython scripts require .ipy even with #!/usr/bin/ipython
  • Eliah Kagan
    Eliah Kagan over 7 years
    @BenVoigt make is a good example too, but gcc relies just as heavily on filenames. Here's an example clearer than .c vs .cc: For C, gcc uses suffixes to tell if its first step is to preprocess (.c), compile (.i), assemble (.s), or link (.o). Here, I use -E, -S and -c to tell gcc where to stop, but it uses filenames to know where to start. gcc something.cc won't link to the right libraries for C++ but it will treat the file as C++, which is why many users are confused by the error messages they get when making that mistake.
  • Eliah Kagan
    Eliah Kagan over 7 years
    @Serg "Clearly operating system isn't a very debatable topic" Uh, citation needed. Canonical calls Ubuntu an "operating system." You say an operating system is "composed out of kernel and several basic services," but you do not state which "basic services" you think those are. I'd be particularly interested to know which parts of systemd you consider to be operating system components and which you do not. This answer implicitly uses a notion of "operating system" that is vague and not what most people mean by that term, especially in *nix circles.
  • doug
    doug about 6 years
    So according to this post the context menu & mimetype is a bug?
  • Rinzwind
    Rinzwind about 6 years
    @doug Yes, but severity and impact play a role here. If you scan the 1st bytes of a file you still need a reference map to match what the file is. Mimetype does that. Checking extensions against mimetype instead of the actual file is the easy and quick way to identify a file. Not the correct way but it saves time doing it like that. Identifying a file as a JPG where it is an executable is a possible entry point into your system so a security bug.
  • IMSoP
    IMSoP over 3 years
    @Rinzwind There is no single "correct way" to tell what format a file is in. If you look at the first few bytes of a file and determine it's a JPEG that's not "more true" than looking at the file extension and determining the same thing. Ultimately, file formats are a question of intent not data: how did the author or user intend this sequence of bytes to be interpreted. "Magic numbers" and file extensions are both heuristics for guessing that intent.