Imported a csv-dataset to R but the values becomes factors

177,328

Solution 1

Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.

Solution 2

By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.

It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with

point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))

But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)

Solution 3

You can set this globally for all read.csv/read.* commands with options(stringsAsFactors=F)

Then read the file as follows: my.tab <- read.table( "filename.csv", as.is=T )

Solution 4

When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").

The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:

    stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")

This should import all variables as either integers or numeric.

Solution 5

None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.

 stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric

or you can specify which columns to convert:

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric

Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)
Share:
177,328
Joe
Author by

Joe

Updated on July 05, 2022

Comments

  • Joe
    Joe almost 2 years

    I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:

    setwd("c:/kalle/R")
    stuckey <- read.csv("C:/kalle/R/stuckey.csv")
    point <- stuckey$PTS
    time <- stuckey$MP
    

    However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.

  • Hong Ooi
    Hong Ooi over 13 years
    I don't think stringsAsFactors will help in this case, as all it does is control the conversion of character to factor. It doesn't influence whether read.csv imports a column as numeric or character, which is the underlying problem.
  • Richie Cotton
    Richie Cotton over 13 years
    See factor2numeric here: 4dpiecharts.com/2011/01/10/…
  • artdv
    artdv over 10 years
    careful with cases: 'stringsAsFactors' not 'StringsAsFactors'
  • gented
    gented almost 9 years
    Moreover, stringAsFactor = FALSE generally forces the format to a character, which is exactly the opposite of what has to be achieved here.
  • SmallChess
    SmallChess over 8 years
    I don't recommend this solution because it really just converts to characters, absolutely pointless.
  • SmallChess
    SmallChess over 8 years
    Yes. This should be accepted. The other answer failed to do any proper conversion.
  • user890739
    user890739 over 8 years
    Or you can simply add the option to the function: my.tab <- read.table("filename.csv", stringsAsFactors=F)
  • done_merson
    done_merson about 7 years
    I like the options method because it works with other reads such as read_rds.
  • smci
    smci over 5 years
    I'd drop all mention of read.delim(), it's nothing more than a thin wrapper for read.csv(... sep = "\t"). Otherwise this answer is the best answer to this question. And the OP specifically used read.csv() (which is also just a a thin wrapper for read.table(... sep=','))
  • smci
    smci over 5 years
    Better to globally set the sensible default with options('stringsAsFactors'=FALSE), then you can't forget.
  • James
    James over 5 years
    @gented, isn't having the values read in as a character at least more workable than as a factor? And what's the alternative?
  • gented
    gented over 5 years
    The point is that stringAsFactor = FALSE doesn't solve the problem addressed in the question: if your data are numeric, then they must be converted to numeric and that's it (if this doesn't happen there must be another type of problem with the data, which stringAsFactor doesn't solve).
  • Dirk Eddelbuettel
    Dirk Eddelbuettel over 5 years
    Neither you nor I know that as the question came with no dataset to be actually verifiable. So if you downvoted based on that, you did it wrong. Anyway, I fail to see why people get so excited about an eight year old answer. We covered reading of data a bazillion other times, and sometimes even with a mcve. Without it, all we do is guessing.
  • Ben
    Ben over 4 years
    for me, "stringsAsFactors=FALSE" solved the issue that numeric data was imported as a factor. Now it is not - instead, yes, it is imported as "chr" but now I can convert it. Before that, it didn't work.
  • Krzysztof
    Krzysztof over 2 years
    It's worth noting starting with R 4 this issue is obsolete as stringsAsFactors defaults to FALSE.