scikit-learn: How to calculate root-mean-square error (RMSE) in percentage?

35,121

Your implementation of calculate_mape is not working because you are expecting the check_arrays function, which was removed in sklearn 0.16. check_array is not what you want.

This StackOverflow answer gives a working implementation.

Share:
35,121
Desta Haileselassie Hagos
Author by

Desta Haileselassie Hagos

Updated on July 19, 2020

Comments

  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos almost 4 years

    I have a dataset (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUY2ltNGVkMG05V00) of the following format.

     time     X   Y
    0.000543  0  10
    0.000575  0  10
    0.041324  1  10
    0.041331  2  10
    0.041336  3  10
    0.04134   4  10
      ...
    9.987735  55 239
    9.987739  56 239
    9.987744  57 239
    9.987749  58 239
    9.987938  59 239
    

    The third column (Y) in my dataset is my true value - that's what I wanted to predict (estimate). I want to do a prediction of Y (i.e. predict the current value of Y according to the previous 100 rolling values of X. For this, I have the following python script work using random forest regression model.

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    """
    
    @author: deshag
    """
    
    import pandas as pd
    import numpy as np
    from io import StringIO
    from sklearn.ensemble import RandomForestRegressor
    from sklearn.metrics import mean_squared_error
    from math import sqrt
    
    
    
    df = pd.read_csv('estimated_pred.csv')
    
    for i in range(1,100):
        df['X_t'+str(i)] = df['X'].shift(i)
    
    print(df)
    
    df.dropna(inplace=True)
    
    
    X=pd.DataFrame({ 'X_%d'%i : df['X'].shift(i) for i in range(100)}).apply(np.nan_to_num, axis=0).values
    
    
    y = df['Y'].values
    
    
    reg = RandomForestRegressor(criterion='mse')
    reg.fit(X,y)
    modelPred = reg.predict(X)
    print(modelPred)
    
    print("Number of predictions:",len(modelPred))
    
    meanSquaredError=mean_squared_error(y, modelPred)
    print("MSE:", meanSquaredError)
    rootMeanSquaredError = sqrt(meanSquaredError)
    print("RMSE:", rootMeanSquaredError)
    

    At the end, I measured the root-mean-square error (RMSE) and got an RMSE of 19.57. From what I have read from the documentation, it says that squared errors have the same units as of the response. Is there any way to present the value of an RMSE in percentage? For example, to say this percent of the prediction is correct and this much wrong.

    There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version when i try it as in the following.

    import numpy as np
    from sklearn.utils import check_array
    
    def calculate_mape(y_true, y_pred): 
    y_true, y_pred = check_array(y_true, y_pred)
    
        return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    
    calculate_mape(y, modelPred)
    

    This is returning an error: ValueError: not enough values to unpack (expected 2, got 1). And this seems to be that the check_array function in the recent version returns only a single value, unlike the previous version.

    Is there any way to present the RMSE in percentage or calculate MAPE using sklearn for Python?