Java - removing \u0000 from an String

41,883

Solution 1

string = string.replace("\u0000", ""); // removes NUL chars
string = string.replace("\\u0000", ""); // removes backslash+u0000

The character with u-escaping is done on java source level. For instance "class" is:

public \u0063lass C {

Also you do not need regex.

Solution 2

The first argument to replaceAll is a regular expression, and the Java regex engine understands \uNNNN escapes so

json.replaceAll("\\u0000", "")

will search for the regular expression \u0000, which matches instances of the Unicode NUL character (U+0000), not instances of the actual string \u0000. If you want to match the string \u0000 then you need to use the regular expression \\u0000, which in turn means the Java string literal "\\\\u0000"

json.replaceAll("\\\\u0000", "")

Or more simply, use replace (whose first argument is a literal string rather than a regex) instead of replaceAll

json.replace("\\u0000", "")
Share:
41,883
FeanDoe
Author by

FeanDoe

:)

Updated on July 20, 2021

Comments

  • FeanDoe
    FeanDoe almost 3 years

    I'm using the Twitter API and I have the following string that is bugging me Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
    I want to store that in PostgreSql, but \u0000 is not accepted, so I want to replace it.
    I try to use string= string.replaceAll("\\u0000", ""); but it doesn't work. I just get the following

    String json = TwitterObjectFactory.getRawJSON(user);
    System.out.println(json);
    json = json.replaceAll("\\u0000", "");
    System.out.println(json);
    

    The output (only the part that matters)

    Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
    Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
    

    If I put that part in an String in java the replacement works, but if I put it in an text file or I read it directly for Twitter it doesnt work
    So my question is, How do I replace \u0000 from an string?
    By the way, the full string is this

    {"utc_offset":null,"friends_count":83,"profile_image_url_https":"https://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","listed_count":1,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":0,"description":"Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile","created_at":"Sat May 28 14:24:06 +0000 2011","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"Fsquadritto","id_str":"306825274","profile_link_color":"0084B4","is_translation_enabled":false,"id":306825274,"geo_enabled":false,"profile_background_color":"C0DEED","lang":"es","profile_sidebar_border_color":"C0DEED","profile_location":null,"profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"entities":{"description":{"urls":[]}},"statuses_count":2,"follow_request_sent":false,"followers_count":36,"profile_use_background_image":true,"default_profile":true,"following":false,"name":"Fiorella Squadritto","location":"","profile_sidebar_fill_color":"DDEEF6","notifications":false,"status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"possibly_sensitive":false,"coordinates":null,"created_at":"Fri Oct 12 17:40:35 +0000 2012","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram<\/a>","retweet_count":1,"retweeted":false,"geo":null,"in_reply_to_screen_name":null,"entities":{"urls":[{"display_url":"instagr.am/p/QsOQxTNfvQ/","indices":[49,69],"expanded_url":"http://instagr.am/p/QsOQxTNfvQ/","url":"http://t.co/GKziME7N"}],"hashtags":[{"indices":[24,34],"text":"eduinnova"}],"user_mentions":[{"indices":[35,47],"screen_name":"ensenachile","id_str":"57099132","name":"Enseña Chile","id":57099132}],"symbols":[]},"id_str":"256811615171792896","in_reply_to_user_id":null,"favorite_count":1,"id":256811615171792896,"text":"Amando las matemáticas! #eduinnova @ensenachile  http://t.co/GKziME7N","place":null,"contributors":null,"lang":"es","favorited":false}}
    
  • FeanDoe
    FeanDoe about 9 years
    Thanks, string.replace("\\u0000", ""); (with double backslash) works (:
  • Joop Eggen
    Joop Eggen about 9 years
    With single not? So really backslash+u+0000 was written.
  • Mikko Rantalainen
    Mikko Rantalainen almost 6 years
    It seems that the original question is about replacing the null byte in raw JSON string where null byte is encoded. I guess the correct way to deal with the problem would have been encoding the JSON string correctly before giving it as input to PostgreSQL. Following works just fine in PostgreSQL with json field type: insert into test values (1, '{ "string_with_null": "a\u0000b" }');.