R-squared value in Excel with the 'addtrendline' function?

18,219

Solution 1

RSQ does not return the r-squared value for a power trendline, rather it returns the r-squared for a linear trendline. Excel help reads "For logarithmic, power, and exponential trendlines, Excel uses a transformed regression model", but I am not able to find that model.

See also How to Measure the goodness of a fit of a trendline to a power law

Solution 2

This question (how does the "add trendline" in Excel really works?) also puzzled me for a longer time, because in a research I need to be sure about an origin of my numbers. Because I havent found too much about this on internet, so I tryed several vays of manual R^2 (coefficient of determination) evaluation in order to obtain the same results as Excel.

I made a same observation like it was mentioned in question. When one uses "add trendline" for linear (and also logaritmic) function fitting, the resulting R^2 and regresion parameters are identical to manually calculated parameters. But when one uses "add trendline" for other nonlinear function fitting (for example exponential), the resulting R^2 and regresion parameters differs from manually calculated parameters.

The solution of this problem was already partially mentioned in discussion here. It seems that in order to fit nonlinear trend into provided data, Excel primary linearize the problem. So for example to fit exponential function y=a* exp(b * x) it primarily transforms it into function ln(y)=ln(a)+b*x. Then the relation between ln(y) and x is linear. Afterwards the linearized function is fitted into transformed data using usual strategy of minimization of sum of squared residuals. Thus the regresion parameters ln(a) and b are obtained. Also R^2 is calculated from linearized form. Because it is linear dependency, RSQ() function may be used by Excel for calculation of R^2.

When one follows this procedure manually, then the resulting regresion parameters and R^2 values are identical with those provided by Excel's "add trendline".

So generally the regresion parameters and R^2 values provided by Excel's "add trendline" in case of non-linear regresion are not true nonlinear but most probably obtained after linearization of problem. As a consequence, these parameters slightly differ from parameters calculated directly without any transformations.

Note about R^2: As far as I understand now, the R^2 for linear case (better denoted by a small letter: r^2) is calculated as a square of corelation coefficient. (RSQ()=CORREL()^2=PEARSON()^2). Because of this relation the values of r^2 can range only between 0 - 1. One set of input knowns can be altered by intercept and/or slope without change of r^2 value. On the other hand the R^2 for a nonlinear case (better denoted by capital leter) is defined a different way (see Wikipedia). It's values are not bottom limited to value 0 but maximum value 1 still indicate the best fit. Alteration of one set of input knowns by intercept and/or slope changes the R^2 value. R^2 works same well also in linear case.

Solution 3

It's because you're asking for the R2 of fitting a power function in you chart (i.e. y = a(x)b), whereas the RQS function in Excel gives you the R2 for a linear fit (i.e. y = a(x) + b). I suspect you have a similar issue in Matlab. You'd need to post your code in Matlab though, otherwise we;d all just be guessing.

Share:
18,219
Lucile
Author by

Lucile

Engineer, PhD Student in Hyrology, Stockholm University, Sweden

Updated on June 25, 2022

Comments

  • Lucile
    Lucile almost 2 years

    I have been struggling a while on that; In short, I can't find the equations Excel uses for R2.

    • Here are my data :
      x: 1 2 3 4 5 6 7 8 9 10
      y: 4 9 1 2 1 1 8 5 5 1

    • I plot the data, fit a power law function ('add trendline') and use the 'add trendline > options > Display R-squared value on chart'

    Value displayed :
    R2 = 0.03008.

    Problem 1

    If I calculate it in excel using 'RSQ()' function (taking the values of the parameters Excel has found for the fitting function), or by hand using the definition (wikipedia)...
    R2 = 0.0272

    Problem 2

    In Matlab, using the 'fit' function, the parameters for the fitting function (and then of course the R2) are not the ones that EXCEL has found.

    Questions:

    So here is my main question :
    How does Excel computes the R2 in the 'add trendline' function, as it is obviously not the one from the definition (wikipedia)?

    and the bonus question:
    Why Excel and Matlab don't end up with the same fitting parameters?

    thanks a lot!

    %%%%%% EDIT BELOW! %%%%
    As an answer to a comment; Here is the Matlab code I use:

    %% R-squared with the fit function
    % use the fit function in Matlab, yobs being the data
    [param, results] = fit(x,yobs,'power1');   
    
    % R-squared from the fit function :
    r_sq_from_fit = results.rsquare;
    
    %% here I calculate "by hand" the R-squared, from the general definition (wikipedia!)
    % calculates the fitting data yfit
    yfit = (p_powerlaw.a).*x.^p_powerlaw.b;
    
    % mean of the yobs, total sum of squares, and residual sum of squares
    yobs_mean = mean(yobs);
    SStot = sum((yobs-yobs_mean).^2);
    SSres = sum((yobs-yfit).^2);
    
    r_sq_hand = 1-SSres/SStot;
    

    I find the same values wether I get R-squared from the fit function in Matlab or I calculate it "by hand". Matlab seems to be consistent and apparently uses the strict definition of R-squared in its function...

    However; when I compare:

    • the R-square value given by Excel from RSQ() function
    • and the value I obtain by hand calculating R-square from the definition (taking of course the yfit values that Excel returned me, not the one Matlab returns, as Excel and Matlab don't agree on the fitting parameters!)

    ...I obtain different values! Excel : 0.027, as I said before, and hand calculated : -0.1109 (!)

  • Lucile
    Lucile almost 10 years
    Ok thanks... it's really confusing that Excel uses its own definition of R-square in the 'add trendline', which is not the same as the one used in the RSQ function! And how about the difference in the fitting parameters between Excel and Matlab? with y=ax^b; Excel : a=3.61, b = -0.23 VS Matlab a = 4.81 and b=-0.18...!
  • Lucile
    Lucile almost 10 years
    Hi! thanks, I have edited my question with the Matlab codes! Hope you can help!
  • Ron Rosenfeld
    Ron Rosenfeld almost 10 years
    @user3789714 The result is the same for the linear trendline, which is what the RSQ function is designed for. And, if you are trying to use the trendline parameters for calculations, you should be using a= 3.61530123330863 and b = -0.216941129430519 and not the rounded off numbers you used above. There may be a further problem in that I do not believe your data provides a good enough fit to a power trendline, but I think the articles in the link might give more insight into that.
  • Lucile
    Lucile almost 10 years
    Yes I rounded up the parameters to calculate the fitting values yfit, as intuitively I thought that the difference in the value of R-squared would not be significant when rounding the parameters (now I have calculated it again using the non rounded up parameters, it is almost the same value : 0.02723398). This data is made up. My real data is a huge matrix. But the final problem is that if I compare power-law fitting and exponential fitting in Excel, I choose the power-law (higher R-squared), whereas in Matlab, I use the exponential. that's a big problem...
  • Lucile
    Lucile almost 10 years
    ...Finally yes, there are other ways -and better- to state the goodness of fit of non-linear fittings (Nash Sutcliffe in my field, or the one in the link from your link), the problem is that in a paper, those "other ways" are usually not at all understood by reviewers / readers...
  • Dan
    Dan almost 10 years
    @user3789714 Excel's add trend line might not be the best optimization. Try using the solver in Excel to find the parameters and compare then with Matlab