Python/Pandas: How to Match List of Strings with a DataFrame column

13,065

Solution 1

Here is a readable solution using an individual search_func:

def search_func(row):
    matches = [test_value in row["Description"].lower() 
               for test_value in row["Text_Search"]]

    if any(matches):
        return "Yes"
    else:
        return "No"

This function is then applied row-wise:

# create example data
df = pd.DataFrame({"Description": ["CANSEL SURVEY E PAY", "JX154 TFR?FR xxx8690"],
                   "Employer": ["Cansel Survey Equipment", "Cansel Survey Equipment"]})

print(df)
    Description             Employer
0   CANSEL SURVEY E PAY     Cansel Survey Equipment
1   JX154 TFR?FR xxx8690    Cansel Survey Equipment

# create text searches and match column
df["Text_Search"] = df["Employer"].str.lower().str.split()
df["Match"] = df.apply(search_func, axis=1)

# show result
print(df)
    Description             Employer                    Text_Search                     Match
0   CANSEL SURVEY E PAY     Cansel Survey Equipment     [cansel, survey, equipment]     Yes
1   JX154 TFR?FR xxx8690    Cansel Survey Equipment     [cansel, survey, equipment]     No

Solution 2

Here is fast and memory-saving vectrorized solution, which uses sklearn.feature_extraction.text.CountVectorizer method:

from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer(min_df=1, lowercase=True)

X = vect.fit_transform(df['Employer'])
cols_emp = vect.get_feature_names()

X = vect.fit_transform(df['Description'])
cols_desc = vect.get_feature_names()

common_cols_idx = [i for i,col in enumerate(cols_desc) if col in cols_emp]

df['Match'] = (X.toarray()[:, common_cols_idx] == 1).any(1)

Source DF:

In [259]: df
Out[259]:
          Date           Description   Amount  AutoNumber                 Employer
0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246  Cansel Survey Equipment
2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246  Cansel Survey Equipment
5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment
9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246  Cansel Survey Equipment
10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246  Cansel Survey Equipment
11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246  Cansel Survey Equipment
12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment
13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246  Cansel Survey Equipment
14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment
15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246  Cansel Survey Equipment
17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246  Cansel Survey Equipment
19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246  Cansel Survey Equipment
20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246  Cansel Survey Equipment
21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment
22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246  Cansel Survey Equipment
23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment
24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246             Cansel IR453
26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246  Cansel Survey Equipment
27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246  Cansel Survey Equipment
28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246      Cansel Survey HU521

Result:

In [261]: df
Out[261]:
          Date           Description   Amount  AutoNumber                 Employer  Match
0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246  Cansel Survey Equipment  False
2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246  Cansel Survey Equipment  False
5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment   True
9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246  Cansel Survey Equipment  False
10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246  Cansel Survey Equipment  False
11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246  Cansel Survey Equipment   True
12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246  Cansel Survey Equipment  False
14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246  Cansel Survey Equipment   True
17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246  Cansel Survey Equipment  False
19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246  Cansel Survey Equipment  False
20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246  Cansel Survey Equipment  False
21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246  Cansel Survey Equipment  False
23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment   True
24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246             Cansel IR453   True
26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246  Cansel Survey Equipment  False
27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246  Cansel Survey Equipment  False
28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246      Cansel Survey HU521   True

Some explanations:

In [266]: cols_desc
Out[266]:
['cansel',
 'fr',
 'hi540',
 'hq010',
 'hu204',
 'hu521',
 'ir453',
 'jh401',
 'jj500',
 'jx154',
 'pay',
 'rq574',
 'survey',
 'tfr',
 'ue200',
 'uo414',
 'uq263',
 'ut022',
 'ut460',
 'wu455',
 'ww120',
 'xxx8690']

In [267]: cols_emp
Out[267]: ['cansel', 'equipment', 'hu521', 'ir453', 'survey']

In [268]: common_cols_idx = [i for i,col in enumerate(cols_desc) if col in cols_emp]

In [269]: common_cols_idx
Out[269]: [0, 5, 6, 12]

In [270]: X.toarray()
Out[270]:
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1],
       [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]], dtype=int64)

In [271]: X.toarray()[:, common_cols_idx]
Out[271]:
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [1, 0, 0, 1],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [1, 0, 0, 1],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [1, 0, 0, 1],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [1, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 1, 0, 0]], dtype=int64)

In [272]: X.toarray()[:, common_cols_idx] == 1
Out[272]:
array([[False, False, False, False],
       [False, False, False, False],
       [ True, False, False,  True],
       [False, False, False, False],
       [False, False, False, False],
       [ True, False, False,  True],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [ True, False, False,  True],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [ True, False, False,  True],
       [False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False],
       [False,  True, False, False]], dtype=bool)

In [273]: (X.toarray()[:, common_cols_idx] == 1).any(1)
Out[273]: array([False, False,  True, False, False,  True, False, False, False,  True, False, False, False, False, False,  True,  True, Fals
e, False,  True], dtype=bool)

Solution 3

Timing comparison for different solutions

Let's prepare a bit bigger DF - 2.000 rows:

In [3]: df = pd.concat([df] * 10**2, ignore_index=True)

In [4]: df.shape
Out[4]: (2000, 5)

Solution 1: df.apply(..., axis=1):

df["Text_Search"] = df.Employer.str.lower().str.split().map(set)

In [15]: %%timeit
    ...: df.Description.str.lower().str.split().map(set).to_frame('desc') \
    ...:               .apply(lambda r: (df["Text_Search"] & r.desc).any(),
    ...:                      axis=1)
    ...:
1 loop, best of 3: 5.06 s per loop

Solution 2: CountVectorizer

from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer(min_df=1, lowercase=True)

In [8]: %%timeit
   ...: X = vect.fit_transform(df['Employer'])
   ...: cols_emp = vect.get_feature_names()
   ...: X = vect.fit_transform(df['Description'])
   ...: cols_desc = vect.get_feature_names()
   ...: common_cols_idx = [i for i,col in enumerate(cols_desc) if col in cols_emp]
   ...: (X.toarray()[:, common_cols_idx] == 1).any(1)
   ...:
10 loops, best of 3: 88.2 ms per loop

Solution 3: df.apply(search_func, axis=1)

df["Text_Search"] = df["Employer"].str.lower().str.split()

In [12]: %%timeit
    ...: df.apply(search_func, axis=1)
    ...:
1 loop, best of 3: 362 ms per loop

NOTE: Solution 1 is too slow, so i will not "timeit" this solution for bigger DFs


Comparison of df.apply(search_func, axis=1) and CountVectorizer for 20.000 rows DF:

In [16]: df = pd.concat([df] * 10, ignore_index=True)

In [17]: df.shape
Out[17]: (20000, 6)

In [20]: %%timeit
    ...: df.apply(search_func, axis=1)
    ...:
1 loop, best of 3: 3.66 s per loop

In [21]: %%timeit
    ...: X = vect.fit_transform(df['Employer'])
    ...: cols_emp = vect.get_feature_names()
    ...: X = vect.fit_transform(df['Description'])
    ...: cols_desc = vect.get_feature_names()
    ...: common_cols_idx = [i for i,col in enumerate(cols_desc) if col in cols_emp]
    ...: (X.toarray()[:, common_cols_idx] == 1).any(1)
    ...:
1 loop, best of 3: 825 ms per loop

Comparison of df.apply(search_func, axis=1) and CountVectorizer for 200.000 rows DF:

In [22]: df = pd.concat([df] * 10, ignore_index=True)

In [23]: df.shape
Out[23]: (200000, 6)

In [24]: %%timeit
    ...: df.apply(search_func, axis=1)
    ...:
1 loop, best of 3: 36.8 s per loop

In [25]: %%timeit
    ...: X = vect.fit_transform(df['Employer'])
    ...: cols_emp = vect.get_feature_names()
    ...: X = vect.fit_transform(df['Description'])
    ...: cols_desc = vect.get_feature_names()
    ...: common_cols_idx = [i for i,col in enumerate(cols_desc) if col in cols_emp]
    ...: (X.toarray()[:, common_cols_idx] == 1).any(1)
    ...:
1 loop, best of 3: 8.28 s per loop

Conclusion: CountVectorized solution is apporx. 4.44 times faster compared to df.apply(search_func, axis=1)

Solution 4

Here is one solution, which splits text into lower-case sets and uses sets intersection for each row:

In [160]: x['Match'] = x.Description.str.lower().str.split().map(set).to_frame('desc') \
     ...:               .apply(lambda r: (x.Employer.str.lower().str.split().map(set) & r.desc).any(),
     ...:                      axis=1)
     ...:

In [161]: x
Out[161]:
          Date           Description   Amount  AutoNumber                 Employer  Match
0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246  Cansel Survey Equipment  False
2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246  Cansel Survey Equipment  False
5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment   True
9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246  Cansel Survey Equipment  False
10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246  Cansel Survey Equipment  False
11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246  Cansel Survey Equipment   True
12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246  Cansel Survey Equipment  False
14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246  Cansel Survey Equipment   True
17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246  Cansel Survey Equipment  False
19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246  Cansel Survey Equipment  False
20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246  Cansel Survey Equipment  False
21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246  Cansel Survey Equipment  False
22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246  Cansel Survey Equipment  False
23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246  Cansel Survey Equipment   True
24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246  Cansel Survey Equipment  False
26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246  Cansel Survey Equipment  False
27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246  Cansel Survey Equipment  False
28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246  Cansel Survey Equipment  False

PS it's pretty slow as it's using not vectorized .apply(..., axis=1) method

Share:
13,065

Related videos on Youtube

Riley Hun
Author by

Riley Hun

ML Engineer | Data Engineer | Software Engineer | Data Scientist

Updated on October 14, 2022

Comments

  • Riley Hun
    Riley Hun over 1 year

    I want to compare two columns: Description and Employer. I want to see if any keywords from Employer are found in the Description column. I have broken the Employer column down to words and converted to a list. Now I want to see if any of those words are in the corresponding Description column.

    Sample input:

    print(df.head(25))
    
    
              Date           Description   Amount  AutoNumber  \
    0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246   
    2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246   
    5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246   
    9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246   
    10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246   
    11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246   
    12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246   
    13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246   
    14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246   
    15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246   
    17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246   
    19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246   
    20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246   
    21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246   
    22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246   
    23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246   
    24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246   
    26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246   
    27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246   
    28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246   
    
                       Employer  
    0   Cansel Survey Equipment  
    2   Cansel Survey Equipment  
    5   Cansel Survey Equipment  
    9   Cansel Survey Equipment  
    10  Cansel Survey Equipment  
    11  Cansel Survey Equipment  
    12  Cansel Survey Equipment  
    13  Cansel Survey Equipment  
    14  Cansel Survey Equipment  
    15  Cansel Survey Equipment  
    17  Cansel Survey Equipment  
    19  Cansel Survey Equipment  
    20  Cansel Survey Equipment  
    21  Cansel Survey Equipment  
    22  Cansel Survey Equipment  
    23  Cansel Survey Equipment  
    24  Cansel Survey Equipment  
    26  Cansel Survey Equipment  
    27  Cansel Survey Equipment  
    28  Cansel Survey Equipment  
    

    I tried something like this, but it doesn't seem to work.:

    df['Text_Search'] = df['Employer'].apply(lambda x: x.split(" "))
    df['Match'] = np.where(df['Description'].str.contains("|".join(df['Text_Search'])), "Yes", "No")
    

    My desired output would be as shown below:

              Date           Description   Amount  AutoNumber  \
    0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246   
    2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246   
    5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246   
    9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246   
    10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246   
    11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246   
    12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246   
    13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246   
    14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246   
    15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246   
    17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246   
    19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246   
    20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246   
    21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246   
    22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246   
    23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246   
    24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246   
    26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246   
    27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246   
    28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246   
    29  12/30/2014  WZ553 TFR?FR xxx8690   200.00       49246   
    32  12/29/2014  JW173 TFR?FR xxx8690   300.00       49246   
    33  12/24/2014   CANSEL SURVEY E PAY  1219.21       49246   
    34  12/24/2014   CANSEL SURVEY E PAY   434.84       49246   
    36  12/23/2014  WT002 TFR?FR xxx8690   160.00       49246   
    
                       Employer                  Text_Search Match  
    0   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    2   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    5   Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes 
    9   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    10  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    11  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  
    12  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    13  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    14  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    15  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  
    17  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    19  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    20  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    21  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    22  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    23  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  
    24  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    26  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    27  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    28  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    29  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    32  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  
    33  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  
    34  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  
    36  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No 
    
  • Riley Hun
    Riley Hun about 7 years
    I will need to look into this solution some more. It looks like a very creative solution. Thanks so much.
  • pansen
    pansen about 7 years
    @maxu i like the usage of the countvectorizer! Just for curiosity - did you time it against the apply Version?
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA about 7 years
    @pansen, no i didn't do that yet - i'll try to "timeit" when i get back home. But i'm almost sure it should be faster for bigger DFs... ;) PS If you want I can can ping you after posting the results of the timing
  • pansen
    pansen about 7 years
    @maxu Sure thanks and no hurry. I could also timeit myself otherwise but I'm travelling too atm :)
  • pansen
    pansen about 7 years
    Nice, thanks for the comparisons! Results are as you predicted. +1
  • flinz
    flinz almost 6 years
    Great job, thanks! I would propose adding this part to your answer above. It's an extension, not a different answer.