R fread and strip white

14,596

Solution 1

There is a parameter strip.white which is set by default to TRUE in fread right now and you can also pass data.table = FALSE to fread to receive a data.frame after reading the dataset

Solution 2

You can use str_trim from stringr package:

library(stringr)
testdata[,sapply(.SD,str_trim)]

By default it trims whitesapces in both sides, but you can set the side:

testdata[,sapply(.SD,str_trim,side="left")]
Share:
14,596
DaReal
Author by

DaReal

Updated on June 15, 2022

Comments

  • DaReal
    DaReal about 2 years

    I have a csv file with extra white spaces that I want to read into R as a dataframe, stripping the white spaces.

    This can be achieved by using

    testdata<-read.csv("file.csv", strip.white=TRUE)
    

    The problem is that the dataset large and takes about half an hour. The fread function is at least twice as fast but does not have the strip.white function.

    library("data.table")
    testdata<-data.frame(fread("file.csv"))
    

    Is there a quick way to strip the white spaces from the columns after reading in, or is there some way to strip the white spaces using fread?

    If it was just a one time import, I wouldn't mind that much, but I need to do this several times and regularly.

  • DaReal
    DaReal over 10 years
    Thank you, this would have done the trick. However, my colleague has a solution outside of R. He used a PERL command on his local Mac OSX machine to strip padding: perl -lape 's/\s+//sg' /path/to/file.csv > /path/to/fileV2.csv This reduces the file size and strips whitespaces before reading it into R.
  • fridaymeetssunday
    fridaymeetssunday about 9 years
    Just a word of caution: using @agstudy's solution will convert numeric columns to chr if these also contain spaces. Otherwise, nice solution.
  • DaReal
    DaReal over 8 years
    Thanks, the fread function has been upgraded since I first ran into this issue, so this is now the way to go.