TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

29,842

The TypeError problem stems from salaries being a list of strings while y_train_actual is a list of floats. Those cannot be subtracted.

For your second error, you should make sure that both arrays are of the same size, otherwise it cannot subtract them.

Share:
29,842
Nyxynyx
Author by

Nyxynyx

Hello :) I have no formal education in programming :( And I need your help! :D These days its web development: Node.js Meteor.js Python PHP Laravel Javascript / jQuery d3.js MySQL PostgreSQL MongoDB PostGIS

Updated on May 02, 2020

Comments

  • Nyxynyx
    Nyxynyx about 4 years

    I am trying to calculate the Mean Squared Error of the predictions y_train_actual from my sci-kit learn model with the original values salaries.

    Problem: However with mean_squared_error(y_train_actual, salaries), I am getting the error TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'. Using list(salaries) instead of salaries as the 2nd parameter gives the same error.

    With mean_squared_error(y_train_actual, y_valid_actual) I am getting the error Found array with dim 40663. Expected 244768

    How can I convert to the correct array types for sklearn.netrucs.mean_squared_error()?

    Code

    from sklearn.metrics import mean_squared_error
    
    y_train_actual = [ np.exp(float(row)) for row in y_train ]
    print mean_squared_error(y_train_actual, salaries)
    

    Error

    TypeError                                 Traceback (most recent call last)
    <ipython-input-144-b6d4557ba9c5> in <module>()
          3 y_valid_actual = [ np.exp(float(row)) for row in y_valid ]
          4 
    ----> 5 print mean_squared_error(y_train_actual, salaries)
          6 print mean_squared_error(y_train_actual, y_valid_actual)
    
    C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred)
       1462     """
       1463     y_true, y_pred = check_arrays(y_true, y_pred)
    -> 1464     return np.mean((y_pred - y_true) ** 2)
       1465 
       1466 
    
    TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
    

    Code

    y_train_actual = [ np.exp(float(row)) for row in y_train ]
    y_valid_actual = [ np.exp(float(row)) for row in y_valid ]
    
    print mean_squared_error(y_train_actual, y_valid_actual)
    

    Error

    ValueError                                Traceback (most recent call last)
    <ipython-input-146-7fcd0367c6f1> in <module>()
          4 
          5 #print mean_squared_error(y_train_actual, salaries)
    ----> 6 print mean_squared_error(y_train_actual, y_valid_actual)
    
    C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred)
       1461 
       1462     """
    -> 1463     y_true, y_pred = check_arrays(y_true, y_pred)
       1464     return np.mean((y_pred - y_true) ** 2)
       1465 
    
    C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options)
        191         if size != n_samples:
        192             raise ValueError("Found array with dim %d. Expected %d"
    --> 193                              % (size, n_samples))
        194 
        195         if not allow_lists or hasattr(array, "shape"):
    
    ValueError: Found array with dim 40663. Expected 244768
    

    Code

    print type(y_train)
    print type(y_train_actual)
    print type(salaries)
    

    Result

    <type 'list'>
    <type 'list'>
    <type 'tuple'>
    

    print y_train[:10]

    [10.126631103850338, 10.308952660644293, 10.308952660644293, 10.221941283654663, 10.126631103850338, 10.126631103850338, 11.225243392518447, 9.9987977323404529, 10.043249494911286, 11.350406535472453]

    print salaries[:10]

    ('25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000')

    print list(salaries)[:10]

    ['25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000']

    print len(y_train)

    244768
    

    print len(salaries)

    244768