How to remove all non-alphabetic characters from a string?

12,703

Solution 1

One way to remove non-alphabetic characters in a string is to use regular expressions [1].

>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'

EDIT

The first argument r'[^a-z]' is the pattern that captures what will removed (here, by replacing it by an empty string ''). The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the a-z denotes all the small caps alphabetiv characters. More information here:

https://docs.python.org/3/library/re.html#regular-expression-syntax

So for instance, to keep also capital letters and spaces it would be:

>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'

However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".

Solution 2

You can use filter:

import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
Share:
12,703

Related videos on Youtube

A. Domni
Author by

A. Domni

Updated on June 04, 2022

Comments

  • A. Domni
    A. Domni almost 2 years

    I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words. I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.

    My code is currently like this:

    y = 0
    import os
    files = os.listdir(".")
    
    filenames = []
    for names in files:
        if names.endswith(".uexp"):
            filenames.append(names)
            y +=1
            print(y)
    print(filenames)
    
    for x in range(1,y):
        filenamestart = (filenames[x][0:3])
        print(filenamestart)
        if filenamestart == "CID":
            openFile = open(filenames[x],'r')
            fileContents = (openFile.read())
            ItemName = (fileContents[104:])
            print(ItemName)
    

    Input Example file (pulled from HxD):

    .........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
    

    I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.

    I appreciate any help that anyone can give me.

    • user3483203
      user3483203 almost 6 years
      Your question title is "remove non-alphabetic characters from a string". From the content of your question, it seems there are many more requirements thatn just that. Please clarify with a sample input, desired output, and a minimal reproducible example
    • A. Domni
      A. Domni almost 6 years
      I have added an example file and what I would like the output to be
    • user3483203
      user3483203 almost 6 years
      So what is your desired output? Just Sun Tan Specialist?
    • user3483203
      user3483203 almost 6 years
      Try something like re.sub(r'(.*?)(\..*)', r'\1', s[104:])
    • A. Domni
      A. Domni almost 6 years
      At this point yes, but ideally I would like: "Sun Tan Specialist | Don't get burned"
    • user3483203
      user3483203 almost 6 years
      But why not Outfit D then?
    • A. Domni
      A. Domni almost 6 years
      Because 'Outfit' will be present for all files beginning with 'CID' and that 'D' isn't relevant. Also using re.sub, I ,managed to get this as the output: "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned", but I don't want those sets of letters like 'FEFFBFFECD'
  • A. Domni
    A. Domni almost 6 years
    This may be a possibility, however I also want to keep capital letters and spaces, how would I do this?