Interpreting the DecisionTreeRegressor score?

13,031

R^2 can be negative from its definition (https://en.wikipedia.org/wiki/Coefficient_of_determination) if the model fits the data worse than a horizontal line. Basically

R^2 = 1 - SS_res/SS_tot

and SS_res and SS_tot are always positive. If SS_res >> SS_tot, you have a negative R^2. Look at this answer as well: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative

Share:
13,031
clockworks
Author by

clockworks

Updated on June 05, 2022

Comments

  • clockworks
    clockworks almost 2 years

    I am trying to evaluate a relevance of features and I am using DecisionTreeRegressor()

    The related part of the code is presented below:

    # TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
    new_data = data.drop(['Frozen'], axis = 1)
    
    # TODO: Split the data into training and testing sets(0.25) using the given feature as the target
    # TODO: Set a random state.
    
    from sklearn.model_selection import train_test_split
    
    
    X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)
    
    # TODO: Create a decision tree regressor and fit it to the training set
    
    from sklearn.tree import DecisionTreeRegressor
    
    
    regressor = DecisionTreeRegressor(random_state=1)
    regressor.fit(X_train, y_train)
    
    # TODO: Report the score of the prediction using the testing set
    
    from sklearn.model_selection import cross_val_score
    
    
    #score = cross_val_score(regressor, X_test, y_test)
    score = regressor.score(X_test, y_test)
    
    print score  # python 2.x 
    

    When I run the print function, it returns the given score:

    -0.649574327334

    You can find the score function implementatioin and some explanation below here and below:

    Returns the coefficient of determination R^2 of the prediction. ... The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).

    I could not grasp the whole concept yet, so this explanation is not very helpful for me. For instance I could not understand why score could be negative and what exactly it indicates (if something is squared, I would expect it can only be positive).


    What does this score indicates and why can it be negative?

    If you know any article (for starters) it might be helpful as well!