Right way to convert data.frame to a numeric matrix, when df also contains strings?

173,626

Solution 1

Edit 2: See @flodel's answer. Much better.

Try:

# assuming SFI is your data.frame
as.matrix(sapply(SFI, as.numeric))  

Edit: or as @ CarlWitthoft suggested in the comments:

matrix(as.numeric(unlist(SFI)),nrow=nrow(SFI))

Solution 2

data.matrix(SFI)

From ?data.matrix:

Description:

 Return the matrix obtained by converting all the variables in a
 data frame to numeric mode and then binding them together as the
 columns of a matrix.  Factors and ordered factors are replaced by
 their internal codes.

Solution 3

Here is an alternative way if the data frame just contains numbers.

apply(as.matrix.noquote(SFI),2,as.numeric)

but the most reliable way of converting a data frame to a matrix is using data.matrix() function.

Share:
173,626
PikkuKatja
Author by

PikkuKatja

student of economics and manangement, not much experience with statistics and programming yet. Glad to be able to get advise from these wonderful people on stackoverflow!

Updated on May 09, 2020

Comments

  • PikkuKatja
    PikkuKatja about 4 years

    I have a data frame taken from a .csv-file which contains numeric and character values. I want to convert this data frame into a matrix. All containing information is numbers (the non-number-rows I deleted), so it should be possible to convert the data frame into a numeric matrix. However, I do get a character matrix.

    I found the only way to solve this is to use as.numeric for each and every row, but this is quite time-consuming. I am quite sure there is a way to do this with some kind of if(i in 1:n)-form, but I cannot figure out how it might work. Or is the only way really to already start with numeric values, like proposed here(Making matrix numeric and name orders)?

    Probably this is a very easy thing for most of you :P

    The matrix is a lot bigger, this is only the first few rows... Here's the code:

    cbind(
    as.numeric(SFI.Matrix[ ,1]),
    as.numeric(SFI.Matrix[ ,2]),
    as.numeric(SFI.Matrix[ ,3]),
    as.numeric(SFI.Matrix[ ,4]),
    as.numeric(SFI.Matrix[ ,5]),
    as.numeric(SFI.Matrix[ ,6]))  
    
    # to get something like this again:
    
    Social.Assistance Danger.Poverty GINI S80S20 Low.Edu        Unemployment 
    0.147             0.125          0.34    5.5   0.149        0.135 0.18683691
    0.258             0.229          0.27    3.8   0.211        0.175 0.22329362
    0.207             0.119          0.22    3.1   0.139        0.163 0.07170422
    0.219             0.166          0.25    3.6   0.114        0.163 0.03638525
    0.278             0.218          0.29    4.1   0.270        0.198 0.27407825
    0.288             0.204          0.26    3.6   0.303        0.211 0.22372633
    

    Thank you for any help!

    • smci
      smci almost 9 years
      Converting numerics-stored-as-strings back to numerics is trivial. Converting other strings to numerics is impossible (unless they're factors, in which case it's a terrible practice, statistically). As to factors, you didn't mention them, but converting factors to numeric is the only interesting part of this question.
  • PikkuKatja
    PikkuKatja about 11 years
    yes, SFI was the data.frame, and yes, it solved the problem! Thank you!
  • Carl Witthoft
    Carl Witthoft about 11 years
    Why not simply matrix(as.numeric(unlist(SFI)),nr=nrows(SFI)) ?
  • Ricardo Saporta
    Ricardo Saporta about 11 years
    @CarlWitthoft, due to doubt of how the coercion of unlist would affect the final result, but you might be right in that regardless of the intermediate coercion, the final coercion from as.numeric should produce the same results. Answer updated
  • antonio
    antonio about 10 years
    this will interpret "123" as a factor and convert it to the related integer level.
  • flodel
    flodel about 10 years
    @antonio. What you say is not true. If the data.frame contains characters, they are converted to numerics, try: data.matrix(data.frame(x = "123", stringsAsFactors = FALSE)). It is only if the data.frame contains factors that they are represented by their internal value (as quoted above), try data.matrix(data.frame(x = "123", stringsAsFactors = TRUE)). So everything is behaving as I would expect and as documented.
  • antonio
    antonio about 10 years
    Sorry, I meant you don't get straight a number out of string, unless you use stringsAsFactors or as.is for read.csv.
  • Rich Scriven
    Rich Scriven over 9 years
    You replaced missing data with numbers? How'd that analysis go?
  • user3315638
    user3315638 over 9 years
    The data missing were stock price quotes in two blocks of cells, Richard. So I manually supplied them. I am guessing that what was key was the outputting of the file by R at Step 2, which must have facilitated R's correct interpretation of every column when the file was returned to it at Step 3. Anyway, it was a big file, so i was really happy to avoid having to describe data structures for individual columns.
  • Zhilong Jia
    Zhilong Jia over 9 years
    data.matrix(as.data.frame(SFI,stringsAsFactors = F) )
  • discipulus
    discipulus about 9 years
    data.matrix didn't work but your solution worked :-)
  • smci
    smci almost 9 years
    @user3315638: exporting and reimporting was totally unnecessary, all you are doing is sapply(df[,StringColsToChangeToNumeric], as.numeric)
  • smci
    smci almost 9 years
    @RichardScriven: in real-world datasets (financial, weblog etc.), filling or imputing NAs is not only important but necessary (obviously, caveats apply). Having said that, this export-CSV-edit-reimport is unnecessary and error-prone and can be replaced with the one-liner above.
  • plijnzaad
    plijnzaad almost 8 years
    one more subtlety: if all values were integer (or can be interpreted as such), the end result is an integer matrix, not a numeric matrix (which e.g. cannot be clustered using hopach, and as.numeric looses the dimensions again ...). I think in this respect the documentation is unclear in that 'numeric mode' also includes integers. And now that I think about it, it is weird that as.numeric always returns a double, that is not very consistent since in all other contexts, numeric means integer-or-double ...
  • Ahmadov
    Ahmadov over 7 years
    converting a data.table which has factor column types to a matrix with data.matrix will result in an integer matrix, not numeric.
  • pbible
    pbible about 7 years
    This is the real answer. The other solutions all clobbered the data in some way for me.
  • user3503711
    user3503711 about 4 years
    data.matrix() is slow compared to as.matrix()
  • flodel
    flodel about 4 years
    @user3503711, OP mentions how his data converts to a character matrix, so as.matrix is not enough. It's the necessary conversion to numeric that slows things down.