Extract substring from string in dataframe

21,942

Solution 1

Regex search is built into the Series class in pandas. You can find the documentation here. In your case, you could use

df['ticker'] = df['Company Name'].str.extract("\((.*)\)") 

Solution 2

You can use the fact that str operates elementwise on a whole series. I assume that the company's symbol will always be at the end of the company name and surrounded by parantheses:

df['Company Symbol'] = df['Company Name'].str.rstrip(')').str.split('(').str[1] # Make new column
df['Company Name'] = df['Company Name'].str.replace(r'\(.*?\)$', '') # Remove symbol from company name
Share:
21,942
nicholas.reichel
Author by

nicholas.reichel

Learning about screen sraping with Python and Pandas, and storing with MySQL and SQLite3. Also interested in Machine Learning anything to do with the stock market.

Updated on March 28, 2020

Comments

  • nicholas.reichel
    nicholas.reichel about 4 years

    I have the following ddataframe:

                                 Company Name        Time Expectation
    0                Asta Funding Inc. (ASFI)  9:35 AM ET           -
    1                       BlackBerry (BBRY)  7:00 AM ET     ($0.03)
    2                    Carnival Corp. (CCL)  9:15 AM ET       $0.09
    3                      Carnival PLC (CUK)  0:00 AM ET           -
    

    I would like to have the company symbols in their own seperate column instead of inside the Company Name column. Right now I just have it iterate over the company names, and a RE pulls the symbols, puts it into a list, and then I apply it to the new column, but I'm wondering if there is a cleaner/easier way.

    I'm new to the whole map reduce lambda stuff.

    for company in df['Company Name']:
        ticker = re.search("\(.*\)",company).group(0)
        ticker = ticker[1:len(ticker)-1]
        tickers.append(ticker)
    
  • nicholas.reichel
    nicholas.reichel about 9 years
    I cannot get it to work still. I'm using r"..." but it still is saying ValueError: This pattern contains no groups to capture.
  • halex
    halex about 9 years
    @mobone Change the regex to "\((.*)\)", note the added parantheses around .* to make this matched part a group