How to identify the file type even though the file-extension has been changed?
Solution 1
Structure, magic numbers, metadata, strings and regular expressions, heuristics and statistical analysis... the tool will only be as good as the database of rules behind it.
Try DROID (Digital Record Object IDentification tool) for identifying file types; Java, Net BSD-licensed. It is a free project of the National Archives UK, unrelated to Android. Source is available on Github and Sourceforge. DROID documentation is good.
See also Darwinsys file and libmagic.
Solution 2
One of the best libraries to do this is Apache Tika. It doesn't only read the file's header, it's also capable of performing content analysis to detect the file type. Using Tika is very simple, here's an example of detecting a file's type:
import java.net.URL;
import org.apache.tika.Tika; //Including Tika
public class TestTika {
public static void main(String[] args) {
Tika tika = new Tika();
String fileType = tika.detect(new URL("http://example.com/someFile.jpg"));
System.out.println(fileType);
}
}
Maximin
Programming is my passion and also profession. I love to explore new tech, learn new things, play with it! Computer Vision, Neural Networks, Machine learning, Microservice driven backend design, implementation and management. My Global Profile Contact me @gmail
Updated on June 05, 2022Comments
-
Maximin almost 2 years
Files are categorized by file-extension. So my question is, how to identify the file type even the file extension has been changed.
For example, i have a video file with name
myVideo.mp4
, i have changed it tomyVideo.txt
. So if i double-click it, the preferred text editor will open the file, and won't open the exact content. But, if i playmyVideo.txt
in a video player, the video will be played without any problem.I was just thinking of developing an application to determine the type of file without checking the file-extension and suggesting the software for opening the file. I would like to develop the application in Java.