HowTo extract MimeType from a byte[]

35,748

Solution 1

Try Java Mime Magic Library

byte[] data = ...
MagicMatch match = Magic.getMagicMatch(data);
String mimeType = match.getMimeType();

Solution 2

I'm sure the library posted by @sfussenegger is the best solution, but I do it by hand with the following snippet that I hope it could help you.

DESCONOCIDO("desconocido", new byte[][] {}), PDF("PDF",
            new byte[][] { { 0x25, 0x50, 0x44, 0x46 } }), JPG("JPG",
            new byte[][] { { (byte) 0xff, (byte) 0xd8, (byte) 0xff,
                    (byte) 0xe0 } }), RAR("RAR", new byte[][] { { 0x52,
            0x61, 0x72, 0x21 } }), GIF("GIF", new byte[][] { { 0x47, 0x49,
            0x46, 0x38 } }), PNG("PNG", new byte[][] { { (byte) 0x89, 0x50,
            0x4e, 0x47 } }), ZIP("ZIP", new byte[][] { { 0x50, 0x4b } }), TIFF(
            "TIFF", new byte[][] { { 0x49, 0x49 }, { 0x4D, 0x4D } }), BMP(
            "BMP", new byte[][] { { 0x42, 0x4d } });

Regards.

PD: The best of it is that it doesn't have any dependency. PD2: No warranty about it's correctness! PD3: "desconocido" stands for "unknown" (in spanish)

Share:
35,748

Related videos on Youtube

mickthompson
Author by

mickthompson

Updated on July 09, 2022

Comments

  • mickthompson
    mickthompson almost 2 years

    I've a web page that that can be used to upload files.
    Now I need to check if the file type is correct (zip, jpg, pdf,...).

    I can use the mimeType that comes with the request but I don't trust the user and let's say I want to be sure that nobody is able to upload a .gif file that was renamed in .jpg
    I think that in this case I should inspect the magic number.
    This is a java library I've found that seems to achieve what I need 'extract the mimetype from the magic number'.
    Is this a correct solution or what do you suggest?

    UPDATE: I've found the mime-util project and it seems very good and up-to-date! (maybe better then Java Mime Magic Library?)
    Here is a list of utility projects that can help you to extract mime-types

  • mickthompson
    mickthompson over 14 years
    I tried activation framework's getContentType() over some .pdf, .xls files but unfortunately the method is always returning 'application/octet-stream'. only for .txt is giving something like 'text/plain'
  • mickthompson
    mickthompson over 14 years
    actually the getContentType only maps the file based on the file extension and a map of mimeType that you provide... this is not what I'm looking for
  • James B
    James B over 14 years
    I agree, that's not what you're looking for!
  • Oscar Pérez
    Oscar Pérez over 11 years
    It does not detect docx files correctly.. it keeps giving application/zip as mimetype...
  • sfussenegger
    sfussenegger about 11 years
    @OscarPérez A docx is indeed a zip archive containing a bunch of XML files, so it's technically correct. You could inspect the archive yourself to see if it is a docx or similar. This would probably be out of scope for this small library.
  • catch23
    catch23 about 11 years
    @sfussenegger What can you say about this SO question check file of MIME-type with JMimeMagic?
  • blong
    blong over 10 years
    Linking to an IP address is weird.