How do I remove all non-ASCII characters with regex and Notepad++?

298,664

Solution 1

This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character

Solution 2

In Notepad++, if you go to menu SearchFind characters in rangeNon-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.

Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

screenshot "Find in Range"

Solution 3

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x1F-\x7F]+

Solution 4

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

Removing non-ASCII

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Highlighting Non-ASCII

Cheers

Solution 5

To keep new lines:

  1. First select a character for new line... I used #.
  2. Select replace option, extended.
  3. input \n replace with #
  4. Hit Replace All

Next:

  1. Select Replace option Regular Expression.
  2. Input this : [^\x20-\x7E]+
  3. Keep Replace With Empty
  4. Hit Replace All

Now, Select Replace option Extended and Replace # with \n

:) now, you have a clean ASCII file ;)

Share:
298,664

Related videos on Youtube

Texh
Author by

Texh

Updated on February 12, 2021

Comments

  • Texh
    Texh over 3 years

    I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.

    I need to know what command to write in find and replace (with picture it would be great).

    • If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked

    • If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...

  • Mike M
    Mike M over 10 years
    and just in case it isn't obvious, if you remove the "^" you are searching the ASCII lines
  • Alex
    Alex almost 10 years
    This works well, but doesn't show all results in a list and no "replace" option
  • FoamyGuy
    FoamyGuy over 9 years
    Works good, but I had to set Encoding->Encode in ANSI. Was unable to find anything otherwise.
  • Unihedron
    Unihedron over 9 years
    Values from \x00 and \x1F are already matched in the answer by ProGM.
  • brunorey
    brunorey over 9 years
    They're matched as values you'd like to keep. I was just suggesting this in case you want to get rid of them.
  • Teson
    Teson almost 9 years
    Works perfectly in netbeans with its regexp-search option (asterix-button)
  • fgb
    fgb over 8 years
    The last example should begin at 20 to exclude the unit separator character. Maybe exclude 7F as well as it's a control character too.
  • hyena
    hyena over 7 years
    if you want to copypaste the search expression [^\x00-\x7F]+
  • Kasim Husaini
    Kasim Husaini about 7 years
    Zapping all characters replaces all type of punctuation marks with ###. The solution I would expect is: Replacing “ & ” with ". Replacing ‘ & ’ with '. etc.
  • Raghav
    Raghav over 6 years
    It works fine, however, the tool replaces funny chars with one # char and not three. please take note.
  • yashhy
    yashhy over 6 years
    works in VS-Code, don't forget to click Regex search option!
  • Steffen Winkler
    Steffen Winkler over 6 years
    If you want to keep \r and \n - carriage return and linefeed characters - you can use this regex: [\x00-\x09\x0B-\x0C\x0E-\x1F]+
  • Steffen Winkler
    Steffen Winkler over 6 years
    If you want to keep \r and \n - carriage return and linefeed characters - you can use this regex: [\x00-\x09\x0B-\x0C\x0E-\x1F]+
  • Peter Mortensen
    Peter Mortensen almost 6 years
    The Text FX plugin is deprecated and may not even be readily available anymore. See e.g. TextFX's Future - "When the list grows long enough, it will become practical to bid farewell to an aging workhorse that has served the community well."
  • oldboy
    oldboy over 5 years
    is there any way to automate this??
  • Anatoly Alekseev
    Anatoly Alekseev over 5 years
    Why can't this expression find single characters? there must be at least 2 such to be detected with it in Notepad++
  • Pablo Adames
    Pablo Adames almost 5 years
    Brilliant! I removed all pesky non-ASCII characters using the qdap R package using: mgsub("[^\x1F-\x7F]+", "", text_vector, fixed = FALSE)
  • Jean-Francois T.
    Jean-Francois T. over 4 years
    Neat... because I always forget the regex for the non-ASCII and have to Google it each time to go back to this page :)
  • Toto
    Toto over 3 years
    Do you really want to do that for ALL non ASCII characters? They are thousands!
  • Jason C
    Jason C over 2 years
    So the trick with this is when you press find here it selects the character. Then you just go to the Edit menu and pick Replace, and Notepad++ always fills the "find" box in with the current selection, which will be the character you found. So you can do the rest of the find/replace in the normal dialog.