Expected 2D array, got 1D array instead, Reshape Data

15,878

Solution 1

Ok I finally got the code to work. Please see the solution below:

# Data Preprocessing

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])


# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])


# Transform Name into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()

# Transform University into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [6])
X = onehotencoder2.fit_transform(X).toarray()

Solution 2

try changing you code to this

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])

# Transform into a Matrix

onehotencoder1 = OneHotEncoder(categorical_features = [0])
res_0 = onehotencoder1.fit_transform(X[:, 0].reshape(-1, 1))  # <=== Change
X[:, 0] = res_0.ravel()

# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])

If you are getting error at labelencoder_x1.fit_transform(X[:, 1]) then make it labelencoder_x1.fit_transform(X[:, 1].reshape(-1, 1))

Share:
15,878
wolfbagel
Author by

wolfbagel

Updated on June 05, 2022

Comments

  • wolfbagel
    wolfbagel almost 2 years

    I'm really stuck on this problem. I'm trying to use OneHotEncoder to encode my data into a matrix after using LabelEncoder but getting this error: Expected 2D array, got 1D array instead.

    At the end of the error message(included below) it said to "Reshape my data" which I thought I did but it's still not working. If I understand Reshaping, is that just when you want to literally reshape some data into a different matrix size? For example, if I want to change a 3 x 2 matrix into a 4 x 6?

    My code is failing on these 2 lines:

    X = X.reshape(-1, 1) # I added this after I saw the error
    X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
    

    Here is the code I have so far:

    # Data Preprocessing
    
    # Import Libraries
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    
    # Import Dataset
    dataset = pd.read_csv('Data2.csv')
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, 5].values
    df_X = pd.DataFrame(X)
    df_y = pd.DataFrame(y)
    
    # Replace Missing Values
    from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
    imputer = imputer.fit(X[:, 3:5 ])
    X[:, 3:5] = imputer.transform(X[:, 3:5])
    
    
    # Encoding Categorical Data "Name"
    from sklearn.preprocessing import LabelEncoder, OneHotEncoder
    labelencoder_x = LabelEncoder()
    X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
    
    # Transform into a Matrix
    
    onehotencoder1 = OneHotEncoder(categorical_features = [0])
    X = X.reshape(-1, 1)
    X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
    
    
    # Encoding Categorical Data "University"
    from sklearn.preprocessing import LabelEncoder
    labelencoder_x1 = LabelEncoder()
    X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
    

    Here is the full error message:

     File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 1809, in _transform_selected
        X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
    
      File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
        "if it contains a single sample.".format(array))
    
    ValueError: Expected 2D array, got 1D array instead:
    array=[  2.00000000e+00   7.00000000e+00   3.20000000e+00   2.70000000e+01
       2.30000000e+03   1.00000000e+00   6.00000000e+00   3.90000000e+00
       2.80000000e+01   2.90000000e+03   3.00000000e+00   4.00000000e+00
       4.00000000e+00   3.00000000e+01   2.76700000e+03   2.00000000e+00
       8.00000000e+00   3.20000000e+00   2.70000000e+01   2.30000000e+03
       3.00000000e+00   0.00000000e+00   4.00000000e+00   3.00000000e+01
       2.48522222e+03   5.00000000e+00   9.00000000e+00   3.50000000e+00
       2.50000000e+01   2.50000000e+03   5.00000000e+00   1.00000000e+00
       3.50000000e+00   2.50000000e+01   2.50000000e+03   0.00000000e+00
       2.00000000e+00   3.00000000e+00   2.90000000e+01   2.40000000e+03
       4.00000000e+00   3.00000000e+00   3.70000000e+00   2.77777778e+01
       2.30000000e+03   0.00000000e+00   5.00000000e+00   3.00000000e+00
       2.90000000e+01   2.40000000e+03].
    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
    

    Any help would be great.

    • Jai
      Jai over 6 years
      Your array is 1D it has to be 2D .... where ever you are getting the error just add numpy.asmatrix(data) where data is the data that you are passing... or you can reshape ... Passing 1D array has been deprecated in recent versions of sklearn
    • wolfbagel
      wolfbagel over 6 years
      Hi @JayShah in my code I added: X = X.reshape(-1, 1). Is this the correct way to reshape data?
    • Jai
      Jai over 6 years
      yes X = X.reshape(-1, 1) is the right way is to reshape data but in the error but this will only work if your X is a numpy array and not list... If it is a list than make your array list of list ... from the error message I can clearly see array = [ ] is 1D because it has one opening and clasing brackets and after reshaping please remove X[:, 1] in transform and just put X
  • wolfbagel
    wolfbagel over 6 years
    Thanks for the solution! I'm running it line by line but it's failing on this line: "X[:, 0] = res_0.ravel()", saying "ravel not found".
  • Jai
    Jai over 6 years
    try np.ravel(res_0)