Removing BOM characters using Java
49,484
Solution 1
Java does not handle BOM properly. In fact Java handles a BOM like every other char.
Found this:
http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
public static final String UTF8_BOM = "\uFEFF";
private static String removeUTF8BOM(String s) {
if (s.startsWith(UTF8_BOM)) {
s = s.substring(1);
}
return s;
}
May be I would use apache IO instead:
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html
Solution 2
For UTF-8 the BOM is: 0xEF, 0xBB, 0xBF
Related videos on Youtube
Author by
James Raitsev
I ask a lot of questions. Some of them are good.
Updated on September 18, 2020Comments
-
James Raitsev over 3 years
What needs to happen to a string using Java to be an equivalent of
vi
s:set nobomb
Assume that
BOM
comes from the file I am reading.-
fge about 10 yearsStrings in Java do not have BOM... Unless you read from a source which has one
-
James Raitsev about 10 yearsThis is precisely what happens. I am reading the file that happens to have this mark
-
fge about 10 yearsDo you at least know what encoding is used (UTF-8, UTF-16 LE/BE)?
-
Durandal about 10 yearsIf you have the option just open the file with Notepad++ or SublimeText and resave it without a BOM. Otherwise you'd need to know the encoding type to do it programatically
-
-
Walter Tross over 6 years
UTF8_BOM
is a wrong name. There is nothing in the BOM that links it to UTF-8. On the contrary, UTF-8 does NOT need the BOM, while UTF-16 MAY (and Microsoft has the bad habit of writing UTF-16 files with a BOM, which often get converted to UTF-8 with BOM by bad tools). -
Krzysztof Tomaszewski over 5 yearsUTF-8 BOM consists of 3 bytes, not 2.