R - Plm and lm - Fixed effects

24,517

Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm from the package lfe to do the same thing:

N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
                 region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)


model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
#  (Intercept)       -0.0522691  0.1422052   -0.368   0.7132    
#  a                  1.9982165  0.0101501  196.866   <2e-16 ***
#  b                 -1.4787359  0.0101666 -145.450   <2e-16 ***

library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))

model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)

# Coefficients :
#    Estimate Std. Error t-value  Pr(>|t|)    
# a  1.998217   0.010150  196.87 < 2.2e-16 ***
# b -1.478736   0.010167 -145.45 < 2.2e-16 ***

library(lfe)

model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# a  1.99822    0.01015   196.9   <2e-16 ***
# b -1.47874    0.01017  -145.4   <2e-16 ***
Share:
24,517

Related videos on Youtube

Jasper
Author by

Jasper

Updated on July 25, 2022

Comments

  • Jasper
    Jasper almost 2 years

    I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:

    lm(Y ~ A + B + factor(region) + factor(year), data = df)
    

    or

    library(plm)
    plm(Y ~ A + B, 
        data = df, index = c('region', 'year'), model = 'within',
        effect = 'twoways')
    

    In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).

    Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'. What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?

  • Jasper
    Jasper about 7 years
    Thank you very much Christoph. Your answer is very neat. I'm digging further into the data set. I cannot share the data but I suppose that such discrepancy has to be related to the way variables were constructed then. I voted your answer up.
  • egodial
    egodial over 6 years
    Hi @GhostCat. I think the question wasn't answered before and I am suggesting that this is not a data issue but something consistent in the package.
  • GhostCat
    GhostCat over 6 years
    Then you should make that more clear. It almost reads like you put up a "me similar problem now what" question in disguise.
  • FrancescoVe
    FrancescoVe over 5 years
    Was the issue finally solved? I was having similar difficulties, and I get the same coefficients with lm and plm. However, (only) in the data frame I use with the function plm I insert factors instead of variables (that are factorised inside the function lm).