How to extract fitted splines from a GAM (`mgcv::gam`)

11,424

In mgcv::gam there is a way to do this (your Q2), via the predict.gam method and type = "lpmatrix".

?predict.gam even has an example, which I reproduce below:

 library(mgcv)
 n <- 200
 sig <- 2
 dat <- gamSim(1,n=n,scale=sig)

 b <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3), data = dat)

 newd <- data.frame(x0=(0:30)/30, x1=(0:30)/30, x2=(0:30)/30, x3=(0:30)/30)

 Xp <- predict(b, newd, type="lpmatrix")

 ##################################################################
 ## The following shows how to use use an "lpmatrix" as a lookup 
 ## table for approximate prediction. The idea is to create 
 ## approximate prediction matrix rows by appropriate linear 
 ## interpolation of an existing prediction matrix. The additivity 
 ## of a GAM makes this possible. 
 ## There is no reason to ever do this in R, but the following 
 ## code provides a useful template for predicting from a fitted 
 ## gam *outside* R: all that is needed is the coefficient vector 
 ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
 ## higher order interpolation for higher accuracy.  
 ###################################################################

 xn <- c(.341,.122,.476,.981) ## want prediction at these values
 x0 <- 1         ## intercept column
 dx <- 1/30      ## covariate spacing in `newd'
 for (j in 0:2) { ## loop through smooth terms
   cols <- 1+j*9 +1:9      ## relevant cols of Xp
   i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
   w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
   ## find approx. predict matrix row portion, by interpolation
   x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
 }
 dim(x0)<-c(1,28) 
 fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
 se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
 ## compare to normal prediction
 predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
         x2=xn[3],x3=xn[4]),se=TRUE)

That goes through the entire process even the prediction step which would be done outside R or of the GAM model. You are going to have to modify the example a bit to do what you want as the example evaluates all terms in the model and you have two other terms besides the spline - essentially you do the same thing, but only for the spline terms, which involves finding the relevant columns and rows of the Xp matrix for the spline. Then also you should note that the spline is centred so you may or may not want to undo that too.

For your Q1, choose appropriate values for the xn vector/matrix in the example. These correspond to values for the nth term in the model. So set the ones you want fixed to some mean value and then vary the one associated with the spline.

If you are doing all of this in R, it would be easier to just evaluate the spline at the values of the spline covariate that you have data for that is going into the other model. You do that by creating a data frame of values at which to predict at, then use

predict(mod, newdata = newdat, type = "terms")

where mod is the fitted GAM model (via mgcv::gam), newdat is the data frame containing a column for each variable in the model (including the parametric terms; set the terms you don't want to vary to some constant mean value [say the average of the variable in the data set] or certain level if a factor). The type = "terms" part will return a matrix for each row in newdat with the "contribution" to the fitted value for each term in the model, including the spline term. Just take the column of this matrix that corresponds to the spline - again it is centered.

Perhaps I misunderstood your Q1. If you want to control the knots, see the knots argument to mgcv::gam. By default, mgcv::gam places a knot at the extremes of the data and then the remaining "knots" are spread evenly over the interval. mgcv::gam doesn't find the knots - it places them for you and you can control where it places them via the knots argument.

Share:
11,424

Related videos on Youtube

Tom
Author by

Tom

Updated on June 04, 2022

Comments

  • Tom
    Tom almost 2 years

    I am using GAM to model time trends in a logistic regression. Yet I would like to extract the the fitted spline from it to add it to another model, that cannot be fitted in GAM or GAMM.

    Thus I have 2 questions:

    1. How can I fit a smoother over time so that I force one knot to be at a particular location while letting the model to find the other knots?

    2. How can I extract the matrix from the fitted GAM so that I can use it in as an impute for a different model?

    The types of models I am running are to the following form:

    gam <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
               s(birth_year,by=wealth2) + wealth2 + sex +
               residence + maternal_educ + birth_order,
               data=colombia2, family="binomial")
    

    I've read the extensive documentation for the GAM but I am not sure still. Any suggestion is really appreciated.

  • IRTFM
    IRTFM about 11 years
    That's a very helpful answer. Since I cannot easily donate extra points, I'm going to see if I can find some of your earleir answers to upvote. Shouldn't be too hard. You are an excellent teacher with a deep knowledge base, Gavin.
  • Tom
    Tom about 11 years
    That's a really great explanation. My question was indeed not clear. I want to do a mixture of procedures. I would like to place one or two knots not at a particular location and let the program place the remaning knots whatever is needed; It is possible? Thanks
  • Gavin Simpson
    Gavin Simpson about 11 years
    @AntonioPedroRamos Like I said, the only thing mgcv::gam does is place the knots at the end points and evenly in between. You'll need to position all the knots yourself if you want to choose a few of the knots locations. IIRC these penalised regression models aren't very sensitive to the knot location.