How do I extract lmer fixed effects by observation?
Solution 1
It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)
#First look at the coefficients
fixef(Male.lme2)
#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] +
c(0,fixef(Male.lme2)[2:4])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] +
c(0,fixef(Male.lme2)[5])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]
You are basically running the original data through a match
function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)
EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.
Solution 2
Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:
y = Xb + Zu + e
where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
# Xb
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran
I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:
> head(cbind(fix, ran, fixran, fitted(fm1)))
[,1] [,2] [,3] [,4]
[1,] 251.4051 2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950
# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE
# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE
nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
Reaction ~ Days | Subject, data = sleepstudy,
panel = function(x, y, ...){
panel.points(x, y, type='b', col='blue')
panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
},
key = legend
)
Michelle
Social scientist/applied statistician, and have worked in the welfare, health, justice, and defence areas of government. As well as analysis of administrative data, I have also designed and evaluated operational pilots (for cost and behavioural impacts), sometimes using an experimental design. Programmer in SAS and SPSS, and some VBA, now learning R. Some experience programming in discrete event simulation, with Arena. Soon to be learning to program agent-based models in RePast, using Java. Currently a PhD student.
Updated on June 19, 2022Comments
-
Michelle almost 2 years
I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):
Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID), data = Male.Data, weights = SampleWeight)
and I can successfully retrieve the random effects by
RespondentID
usingranef(Male.lme1)
. I would also like to collect the result of the fixed effects byRespondentID
.coef(Male.lme1)
does not provide exactly what I need, as I show below.> summary(Male.lme1) Linear mixed model fit by REML Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID) Data: Male.Data AIC BIC logLik deviance REMLdev 9994 10039 -4990 9952 9980 Random effects: Groups Name Variance Std.Dev. RespondentID (Intercept) 0.19408 0.44055 Residual 0.37491 0.61230 Number of obs: 4498, groups: RespondentID, 2249 Fixed effects: Estimate Std. Error t value (Intercept) 13.98016 0.03405 410.6 AgeFactor4to8 0.50572 0.04084 12.4 AgeFactor9to13 0.94329 0.04159 22.7 AgeFactor14to18 1.30654 0.04312 30.3 IntakeDayDay2Intake -0.13871 0.01809 -7.7 Correlation of Fixed Effects: (Intr) AgFc48 AgF913 AF1418 AgeFactr4t8 -0.775 AgeFctr9t13 -0.761 0.634 AgFctr14t18 -0.734 0.612 0.601 IntkDyDy2In -0.266 0.000 0.000 0.000
I have appended the fitted results to my data,
head(Male.Data)
showsNutrientID RespondentID Gender Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY lmefits 2 267 100020 1 12 0.4952835 Day1Intake 12145.852 9to13 15.61196 15.22633 7 267 100419 1 14 0.3632839 Day1Intake 9591.953 14to18 15.01444 15.31373 8 267 100459 1 11 0.4952835 Day1Intake 7838.713 9to13 14.51458 15.00062 12 267 101138 1 15 1.3258785 Day1Intake 11113.266 14to18 15.38541 15.75337 14 267 101214 1 6 2.1198688 Day1Intake 7150.133 4to8 14.29022 14.32658 18 267 101389 1 5 2.1198688 Day1Intake 5091.528 4to8 13.47928 14.58117
The first couple of lines from
coef(Male.lme1)
are:$RespondentID (Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake 100020 14.28304 0.5057221 0.9432941 1.306542 -0.1387098 100419 14.00719 0.5057221 0.9432941 1.306542 -0.1387098 100459 14.05732 0.5057221 0.9432941 1.306542 -0.1387098 101138 14.44682 0.5057221 0.9432941 1.306542 -0.1387098 101214 13.82086 0.5057221 0.9432941 1.306542 -0.1387098 101389 14.07545 0.5057221 0.9432941 1.306542 -0.1387098
To demonstrate how the
coef
results relate to the fitted estimates in Male.Data (which were grabbed usingMale.Data$lmefits <- fitted(Male.lme1)
, for the first RespondentID, who has the AgeFactor level 9-13: - the fitted value is15.22633
, which equals - from the coeffs -(Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941
Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of
if
statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:
>str(Male.Data) 'data.frame': 4498 obs. of 11 variables: $ NutrientID : int 267 267 267 267 267 267 267 267 267 267 ... $ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ... $ Gender : int 1 1 1 1 1 1 1 1 1 1 ... $ Age : int 12 14 11 15 6 5 10 2 2 9 ... $ BodyWeight : num 51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ... $ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ... $ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ... $ IntakeAmt : num 12146 9592 7839 11113 7150 ... $ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ... $ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ... $ lmefits : num 15.2 15.3 15 15.8 14.3 ...
The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.
I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)
Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)
AgeLevels <- length(unique(Male.Data$AgeFactor)) Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] + c(0,fixef(Male.lme1)[2:AgeLevels])[ match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18", "19to30","31to50","51to70","71Plus") )] + c(0,fixef(Male.lme1)[(AgeLevels+1)])[ match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]) names(Temp) <- c("FxdEffct")
-
Michelle over 12 yearsThanks for the help, I just amended the quote around the (Intercept) name. I am creating a general R analysis to apply to all age groups, the current data frame just has children, how can I adjust the column indices for a search when I won't necessarily know how many age factor levels will be in the model? I'm trying to automate the analysis as much as possible
-
IRTFM over 12 years
length(unique(Male.Data$AgeFactor))
would give you the number of levels, and you could use that number plus 1 instead of the 4 to get the indices of the AgeFactor, You would obviously need to add use appropriately higher value for the indices of the IntakeDay effects as well. -
smci over 7 yearsThis is awesome, except fixran seems to not always be a good approximation with lme4 1.1-12 . Can you try to replicate that?
-
Ben over 5 yearswhat is the difference between y and Xb+Zu? Shouldn't it be the same?
-
Ben over 5 yearsand: shouldn't the random effects be only Zu instead of Xb+Zu?
-
Teemu Daniel Laajala over 5 yearsSorry it's been over four years since this answer, I'll try return to this @smci to try see what has changed. @Ben; 1) True, they are the same, or more precisely, y = Xb+Zu+e. I extracted these two vectors using two ways; raw data, and the model summing of terms to illustrate that these two indeed are equal. 2) True, strictly speaking Zu is the only proportion that is random effects. However, if you leave out Xb you are left with only a sum of zero-mean normally distributed random effects, which are harder to interpret as population averages are blocked out. Hence why I supplemented them here.