How to solve "rank-deficient fit may be misleading error" on my linear model?

16,671

Two of your SubCategory levels had their associated coefficients suppressed. That means that each of them can be 100% predicted by some combination of price and shipping and the other category and subCategory levels. This is known in the R documentation as being "aliased". The warning may or may not be important, although agree with @ZheyuanLi that it's probably benign. I don't think that this particular warning can be be due to missing values since R regression functions generally operate in a manner to remove entire rows when any one variable has a missing value. Also unlikely is the theory that there is 100% correlation between two variables. If you want to find display the combinations that might give rise to this I suggest starting with

with( dataClean , table( category, SubCategory) )

I predict you will find on one SubCategory is one or more of the category rows.

Share:
16,671
Joan Triay
Author by

Joan Triay

Updated on June 25, 2022

Comments

  • Joan Triay
    Joan Triay almost 2 years

    I have a problem when I use my model to do some prediction, R shows this message Warning message prediction from a rank-deficient fit may be misleading, how can I solve it? I think my model is correct is the prediction that fails and I don't know why.

    Here you can see step by step what I am doing and the summary of model:

    myModel <- lm(margin~.,data = dataClean[train,c(target,numeric,categoric)])
    
    Call:
    lm(formula = margin ~ ., data = dataClean[train, c(target, numeric, categoric)])
    
    Residuals:
      Min        1Q    Median        3Q       Max 
    -0.220407 -0.035272 -0.003415  0.028227  0.276727 
    
    Coefficients: (2 not defined because of singularities)
                                       Estimate Std. Error t value Pr(>|t|)    
    (Intercept)                          6.061e-01  2.260e-02  26.817  < 2e-16 ***
    price                                1.042e-05  8.970e-06   1.162 0.245610    
    shipping                             1.355e-03  2.741e-04   4.943 9.25e-07 ***
    categoryofficeSupplies              -7.721e-02  2.295e-02  -3.364 0.000802 ***
    categorytechnology                  -3.993e-02  2.325e-02  -1.717 0.086249 .  
    subCategorybindersAndAccessories    -1.650e-01  1.421e-02 -11.612  < 2e-16 ***
    subCategorybookcases                 3.337e-04  2.328e-02   0.014 0.988565    
    subCategorychairsChairmats          -3.104e-02  2.106e-02  -1.474 0.140831    
    subCategorycomputerPeripherals       1.356e-02  1.293e-02   1.049 0.294604    
    subCategorycopiersAndFax            -1.943e-01  2.944e-02  -6.598 7.27e-11 ***
    subCategoryenvelopes                -1.648e-01  2.045e-02  -8.057 2.62e-15 ***
    subCategorylabels                   -1.534e-01  1.984e-02  -7.730 3.00e-14 ***
    subCategoryofficeFurnishings        -8.827e-02  2.220e-02  -3.976 7.61e-05 ***
    subCategoryofficeMachines           -1.521e-01  1.639e-02  -9.281  < 2e-16 ***
    subCategorypaper                    -1.624e-01  1.363e-02 -11.909  < 2e-16 ***
    subCategorypensArtSupplies          -8.484e-04  1.524e-02  -0.056 0.955623    
    subCategoryrubberBands               3.174e-02  2.245e-02   1.414 0.157854    
    subCategoryscissorsRulersTrimmers    1.092e-01  2.327e-02   4.693 3.13e-06 ***
    subCategorystorageOrganization       1.219e-01  1.575e-02   7.739 2.82e-14 ***
    subCategorytables                           NA         NA      NA       NA    
    subCategorytelephoneAndComunication         NA         NA      NA       NA    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.08045 on 858 degrees of freedom
    Multiple R-squared:  0.6512,    Adjusted R-squared:  0.6439 
    F-statistic: 88.98 on 18 and 858 DF,  p-value: < 2.2e-16
    
    estimateModel <- predict(myModel, type="response", newdata=dataClean[test, c(numeric,categoric,target)])
    
    Warning message:
    In predict.lm(myModel, type = "response", newdata = dataClean[test,  :
    prediction from a rank-deficient fit may be misleading