Classification functions in linear discriminant analysis in R

11,840

Solution 1

There isn't a built-in way to get the information I needed, so I wrote a function to do it:

ty.lda <- function(x, groups){
  x.lda <- lda(groups ~ ., as.data.frame(x))

  gr <- length(unique(groups))   ## groups might be factors or numeric
  v <- ncol(x) ## variables
  m <- x.lda$means ## group means

  w <- array(NA, dim = c(v, v, gr))

  for(i in 1:gr){
    tmp <- scale(subset(x, groups == unique(groups)[i]), scale = FALSE)
    w[,,i] <- t(tmp) %*% tmp
  }

  W <- w[,,1]
  for(i in 2:gr)
    W <- W + w[,,i]

  V <- W/(nrow(x) - gr)
  iV <- solve(V)

  class.funs <- matrix(NA, nrow = v + 1, ncol = gr)
  colnames(class.funs) <- paste("group", 1:gr, sep=".")
  rownames(class.funs) <- c("constant", paste("var", 1:v, sep = "."))

  for(i in 1:gr) {
    class.funs[1, i] <- -0.5 * t(m[i,]) %*% iV %*% (m[i,])
    class.funs[2:(v+1) ,i] <- iV %*% (m[i,])
  }

  x.lda$class.funs <- class.funs

  return(x.lda)
}

This code follows the formulas in Legendre and Legendre's Numerical Ecology (1998), page 625, and matches the results of the worked example starting on page 626.

Solution 2

Suppose x is your LDA object:

x$terms

You can have a peak at the object by looking at it's structure:

str(x)

Update:

Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
predict(z, Iris[-train, ])$class
str(z)
List of 10
 $ prior  : Named num [1:3] 0.333 0.333 0.333
  ..- attr(*, "names")= chr [1:3] "c" "s" "v"
 $ counts : Named int [1:3] 30 25 20
  ..- attr(*, "names")= chr [1:3] "c" "s" "v"
 $ means  : num [1:3, 1:4] 6.03 5.02 6.72 2.81 3.43 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "c" "s" "v"
  .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
 $ scaling: num [1:4, 1:2] 0.545 1.655 -1.609 -3.682 -0.443 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..$ : chr [1:2] "LD1" "LD2"
 $ lev    : chr [1:3] "c" "s" "v"
 $ svd    : num [1:2] 33.66 2.93
 $ N      : int 75
 $ call   : language lda(formula = Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)
 $ terms  :Classes 'terms', 'formula' length 3 Sp ~ Sepal.L. + Sepal.W. + Petal.L. + Petal.W.
  .. ..- attr(*, "variables")= language list(Sp, Sepal.L., Sepal.W., Petal.L., Petal.W.)
  .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:5] "Sp" "Sepal.L." "Sepal.W." "Petal.L." ...
  .. .. .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..- attr(*, "term.labels")= chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv
  .. ..- attr(*, "predvars")= language list(Sp, Sepal.L., Sepal.W., Petal.L., Petal.W.)
  .. ..- attr(*, "dataClasses")= Named chr [1:5] "factor" "numeric" "numeric" "numeric" ...
  .. .. ..- attr(*, "names")= chr [1:5] "Sp" "Sepal.L." "Sepal.W." "Petal.L." ...
 $ xlevels: Named list()
 - attr(*, "class")= chr "lda"
Share:
11,840

Related videos on Youtube

Tyler
Author by

Tyler

Botanist

Updated on June 04, 2022

Comments

  • Tyler
    Tyler almost 2 years

    After completing a linear discriminant analysis in R using lda(), is there a convenient way to extract the classification functions for each group?

    From the link,

    These are not to be confused with the discriminant functions. The classification functions can be used to determine to which group each case most likely belongs. There are as many classification functions as there are groups. Each function allows us to compute classification scores for each case for each group, by applying the formula:

    Si = ci + wi1*x1 + wi2*x2 + ... + wim*xm
    

    In this formula, the subscript i denotes the respective group; the subscripts 1, 2, ..., m denote the m variables; ci is a constant for the i'th group, wij is the weight for the j'th variable in the computation of the classification score for the i'th group; xj is the observed value for the respective case for the j'th variable. Si is the resultant classification score.

    We can use the classification functions to directly compute classification scores for some new observations.

    I can build them from scratch using textbook formulas, but that requires rebuilding a number of intermediate steps from the lda analysis. Is there a way to get them after the fact from the lda object?

    Added:

    Unless I'm still misunderstanding something in Brandon's answer (sorry for the confusion!), it appears the answer is no. Presumably the majority of users can get the information they need from predict(), which provides classifications based on lda().

  • Tyler
    Tyler about 13 years
    I think something's missing here: there is no terms element in my lda object!
  • Brandon Bertelsen
    Brandon Bertelsen about 13 years
    Are you using lda() from the MASS package?
  • Tyler
    Tyler about 13 years
    Yes, and there's no terms - str() shows: "prior", "counts", "means", "scaling", "lev", "svd", "N", "call"
  • Brandon Bertelsen
    Brandon Bertelsen about 13 years
    Strange, I was using the example at the bottom of ?lda with the Iris data. Perhaps, if you could provide some sample data?
  • Tyler
    Tyler about 13 years
    I'm looking at the lda example now, from R 2.12.2, and there is no reference to terms. What version of R are you using?
  • Tyler
    Tyler about 13 years
    Ah, I see now. lda(x, groups) produces different output than lda(groups ~ ., x). However, now that I actually see the terms, there's nothing in there that resembles a classification function. I added a link to the definition of classification functions to my question. Maybe they're in there and I'm too dense to see them?
  • Brandon Bertelsen
    Brandon Bertelsen about 13 years
    My bad, I misunderstood what you are looking for. My answer does not solve your problem.
  • Tyler
    Tyler about 13 years
    No worries. Thanks for your perseverance.
  • Tyler
    Tyler over 11 years
    I didn't ask about discriminant functions, I asked about classification functions. They are not the same thing. See the link for a description.
  • IRTFM
    IRTFM over 11 years
    Right. A classification function projects the data onto the discriminant functions and discretizes it based on a threshold rule. If you wnat to see how MASS::predict.lda does it then load MASS and type getAnywhere(predict.lda).
  • Tyler
    Tyler over 11 years
    There is one classification function for each group in an lda, as indicated in the link I provided, as well as the book I cited. I've now included a quote from the linked page to clarify, so that we can at least be sure we're arguing about the same thing. predict.lda provides the results of applying the classification functions, but it does not return the functions themselves. That's what I asked for. I still don't understand what was flawed or misleading about this question.
  • Juanchi
    Juanchi almost 9 years
    I'm looking for the same thing (stackoverflow.com/questions/30495710/…). In SAS, it's really easy to get from a LDA.(support.sas.com/documentation/cdl/en/statug/63033/HTML/‌​default/…)

Related