python - how to append numpy array to a pandas dataframe
40,759
Solution 1
Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x
is the 2D numpy array with predictions,
x = sentiment_model.predict_proba(test_matrix)
then you can do,
test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]
Solution 2
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.arange(10).reshape(5, 2), columns=['a', 'b'])
print('df:', df, sep='\n')
arr = np.arange(100, 104).reshape(2, 2)
print('array to append:', arr, sep='\n')
df = df.append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)
print('df:', df, sep='\n')
output
df:
a b
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
array to append:
[[100 101]
[102 103]]
df:
a b
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
5 100 101
6 102 103
Author by
DBE7
Updated on July 05, 2022Comments
-
DBE7 almost 2 years
I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the
predict_proba
-function to my Pandas data frame containing the reviews. I tried doing something like:test_data['prediction'] = sentiment_model.predict_proba(test_matrix)
Obviously, that doesn't work, since
predict_proba
returns a 2D-numpy array. So, what is the most efficient way of doing this? I createdtest_matrix
with SciKit-Learn's CountVectorizer:vectorizer = CountVectorizer(token_pattern=r'\b\w+\b') train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U')) test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))
Sample data looks like:
| Review | Prediction | | ------------------------------------------ | ------------------ | | "Toy was great! Our six-year old loved it!"| 0.986 |