scikit-learn: How to calculate root-mean-square error (RMSE) in percentage?
Your implementation of calculate_mape
is not working because you are expecting the check_arrays
function, which was removed in sklearn 0.16
. check_array
is not what you want.
This StackOverflow answer gives a working implementation.
Desta Haileselassie Hagos
Updated on July 19, 2020Comments
-
Desta Haileselassie Hagos almost 4 years
I have a dataset (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUY2ltNGVkMG05V00) of the following format.
time X Y 0.000543 0 10 0.000575 0 10 0.041324 1 10 0.041331 2 10 0.041336 3 10 0.04134 4 10 ... 9.987735 55 239 9.987739 56 239 9.987744 57 239 9.987749 58 239 9.987938 59 239
The third column (Y) in my dataset is my true value - that's what I wanted to predict (estimate). I want to do a prediction of
Y
(i.e. predict the current value ofY
according to the previous 100 rolling values ofX
. For this, I have the followingpython
script work usingrandom forest regression model
.#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ @author: deshag """ import pandas as pd import numpy as np from io import StringIO from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error from math import sqrt df = pd.read_csv('estimated_pred.csv') for i in range(1,100): df['X_t'+str(i)] = df['X'].shift(i) print(df) df.dropna(inplace=True) X=pd.DataFrame({ 'X_%d'%i : df['X'].shift(i) for i in range(100)}).apply(np.nan_to_num, axis=0).values y = df['Y'].values reg = RandomForestRegressor(criterion='mse') reg.fit(X,y) modelPred = reg.predict(X) print(modelPred) print("Number of predictions:",len(modelPred)) meanSquaredError=mean_squared_error(y, modelPred) print("MSE:", meanSquaredError) rootMeanSquaredError = sqrt(meanSquaredError) print("RMSE:", rootMeanSquaredError)
At the end, I measured the root-mean-square error (RMSE) and got an
RMSE
of19.57
. From what I have read from the documentation, it says that squared errors have the same units as of the response. Is there any way to present the value of anRMSE
in percentage? For example, to say this percent of the prediction is correct and this much wrong.There is a
check_array
function for calculatingmean absolute percentage error (MAPE)
in the recent version ofsklearn
but it doesn't seem to work the same way as the previous version when i try it as in the following.import numpy as np from sklearn.utils import check_array def calculate_mape(y_true, y_pred): y_true, y_pred = check_array(y_true, y_pred) return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 calculate_mape(y, modelPred)
This is returning an error:
ValueError: not enough values to unpack (expected 2, got 1)
. And this seems to be that thecheck_array
function in the recent version returns only a single value, unlike the previous version.Is there any way to present the
RMSE
in percentage or calculateMAPE
usingsklearn
forPython
?