How do I extract lmer fixed effects by observation?

r glm

12,500

Solution 1

It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)

#First look at the coefficients
fixef(Male.lme2)

#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] + 
c(0,fixef(Male.lme2)[2:4])[
          match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] + 
c(0,fixef(Male.lme2)[5])[
          match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]

You are basically running the original data through a match function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)

EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.

Solution 2

Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:

y = Xb + Zu + e

where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):

library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

# Xb 
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran

I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:

> head(cbind(fix, ran, fixran, fitted(fm1)))
         [,1]      [,2]     [,3]     [,4]
[1,] 251.4051  2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950

# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE

# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE

nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
    Reaction ~ Days | Subject, data = sleepstudy,
    panel = function(x, y, ...){
        panel.points(x, y, type='b', col='blue')
        panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
        panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
    },
    key = legend
)

enter image description here

12,500

Author by

Michelle

Social scientist/applied statistician, and have worked in the welfare, health, justice, and defence areas of government. As well as analysis of administrative data, I have also designed and evaluated operational pilots (for cost and behavioural impacts), sometimes using an experimental design. Programmer in SAS and SPSS, and some VBA, now learning R. Some experience programming in discrete event simulation, with Arena. Soon to be learning to program agent-based models in RePast, using Java. Currently a PhD student.

Updated on June 19, 2022

Comments

Michelle almost 2 years

I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):

Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID),
    data = Male.Data, 
    weights = SampleWeight)

and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1). I would also like to collect the result of the fixed effects by RespondentID. coef(Male.lme1) does not provide exactly what I need, as I show below.

> summary(Male.lme1)
Linear mixed model fit by REML 
Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID) 
   Data: Male.Data 
  AIC   BIC logLik deviance REMLdev
  9994 10039  -4990     9952    9980
Random effects:
 Groups       Name        Variance Std.Dev.
 RespondentID (Intercept) 0.19408  0.44055 
 Residual                 0.37491  0.61230 
Number of obs: 4498, groups: RespondentID, 2249

Fixed effects:
                    Estimate Std. Error t value
(Intercept)         13.98016    0.03405   410.6
AgeFactor4to8        0.50572    0.04084    12.4
AgeFactor9to13       0.94329    0.04159    22.7
AgeFactor14to18      1.30654    0.04312    30.3
IntakeDayDay2Intake -0.13871    0.01809    -7.7

Correlation of Fixed Effects:
            (Intr) AgFc48 AgF913 AF1418
AgeFactr4t8 -0.775                     
AgeFctr9t13 -0.761  0.634              
AgFctr14t18 -0.734  0.612  0.601       
IntkDyDy2In -0.266  0.000  0.000  0.000

I have appended the fitted results to my data, head(Male.Data) shows

   NutrientID RespondentID Gender Age SampleWeight  IntakeDay IntakeAmt AgeFactor BoxCoxXY  lmefits
2         267       100020      1  12    0.4952835 Day1Intake 12145.852     9to13 15.61196 15.22633
7         267       100419      1  14    0.3632839 Day1Intake  9591.953    14to18 15.01444 15.31373
8         267       100459      1  11    0.4952835 Day1Intake  7838.713     9to13 14.51458 15.00062
12        267       101138      1  15    1.3258785 Day1Intake 11113.266    14to18 15.38541 15.75337
14        267       101214      1   6    2.1198688 Day1Intake  7150.133      4to8 14.29022 14.32658
18        267       101389      1   5    2.1198688 Day1Intake  5091.528      4to8 13.47928 14.58117

The first couple of lines from coef(Male.lme1) are:

$RespondentID
       (Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake
100020    14.28304     0.5057221      0.9432941        1.306542          -0.1387098
100419    14.00719     0.5057221      0.9432941        1.306542          -0.1387098
100459    14.05732     0.5057221      0.9432941        1.306542          -0.1387098
101138    14.44682     0.5057221      0.9432941        1.306542          -0.1387098
101214    13.82086     0.5057221      0.9432941        1.306542          -0.1387098
101389    14.07545     0.5057221      0.9432941        1.306542          -0.1387098

To demonstrate how the coef results relate to the fitted estimates in Male.Data (which were grabbed using Male.Data$lmefits <- fitted(Male.lme1), for the first RespondentID, who has the AgeFactor level 9-13: - the fitted value is 15.22633, which equals - from the coeffs - (Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941

Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of if statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?

Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:

>str(Male.Data)
'data.frame':   4498 obs. of  11 variables:
 $ NutrientID  : int  267 267 267 267 267 267 267 267 267 267 ...
 $ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Gender      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Age         : int  12 14 11 15 6 5 10 2 2 9 ...
 $ BodyWeight  : num  51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ...
 $ SampleWeight: num  0.495 0.363 0.495 1.326 2.12 ...
 $ IntakeDay   : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
 $ IntakeAmt   : num  12146 9592 7839 11113 7150 ...
 $ AgeFactor   : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
 $ BoxCoxXY    : num  15.6 15 14.5 15.4 14.3 ...
 $ lmefits     : num  15.2 15.3 15 15.8 14.3 ...

The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.

I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)

Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)

AgeLevels <- length(unique(Male.Data$AgeFactor))
Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] + 
c(0,fixef(Male.lme1)[2:AgeLevels])[
      match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18",  "19to30","31to50","51to70","71Plus") )] + 
c(0,fixef(Male.lme1)[(AgeLevels+1)])[
      match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )])
names(Temp) <- c("FxdEffct")

Michelle over 12 years

Thanks for the help, I just amended the quote around the (Intercept) name. I am creating a general R analysis to apply to all age groups, the current data frame just has children, how can I adjust the column indices for a search when I won't necessarily know how many age factor levels will be in the model? I'm trying to automate the analysis as much as possible
IRTFM over 12 years

length(unique(Male.Data$AgeFactor)) would give you the number of levels, and you could use that number plus 1 instead of the 4 to get the indices of the AgeFactor, You would obviously need to add use appropriately higher value for the indices of the IntakeDay effects as well.
smci over 7 years

This is awesome, except fixran seems to not always be a good approximation with lme4 1.1-12 . Can you try to replicate that?
Ben over 5 years

what is the difference between y and Xb+Zu? Shouldn't it be the same?
Ben over 5 years

and: shouldn't the random effects be only Zu instead of Xb+Zu?
Teemu Daniel Laajala over 5 years

Sorry it's been over four years since this answer, I'll try return to this @smci to try see what has changed. @Ben; 1) True, they are the same, or more precisely, y = Xb+Zu+e. I extracted these two vectors using two ways; raw data, and the model summing of terms to illustrate that these two indeed are equal. 2) True, strictly speaking Zu is the only proportion that is random effects. However, if you leave out Xb you are left with only a sum of zero-mean normally distributed random effects, which are harder to interpret as population averages are blocked out. Hence why I supplemented them here.