read.csv, header on first line, skip second line

r header skip read.csv

56,099

Solution 1

This should do the trick:

all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.

Solution 2

You can strip the first line(s) after the header directly from the dataframe, to allow you to do this in one line:

df<-read.csv("test.txt",header=T)[-1,]

if my datafile "test.txt" is the following:

var1, var2
units1, units2
2.3,6.8
4.5,6.7

this gives me

> read.csv("test.txt",header=T)[-1,]
var1 var2
2  2.3  6.8
3  4.5  6.7

This answers your question exactly, but just to generalize the answer, you can also skip the Nth to the Mth lines in this way:

df<-read.csv("test.txt",header=T)[-N:-M,]

where N and M are integers of course.

Note: This method will convert all columns into factor.

str(read.csv("test.csv", header = TRUE)[-1,])
# 'data.frame': 2 obs. of  2 variables:
#   $ var1: Factor w/ 3 levels "2.3","4.5","units1": 1 2
#   $ var2: Factor w/ 3 levels " units2","6.7",..: 3 2

Solution 3

On Linux (or Mac) you can take advantage of being able to use linux commands in data.table::fread, so

data.table::fread("sed -e '2d' myfile.txt", data.table = F)

will skip the second line.

56,099

Author by

mchangun

Updated on October 10, 2020

Comments

mchangun over 3 years

I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:

data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE)

The first row becomes the header and the second row of the file becomes the first row of my data frame:

  Xaaaaaaaaa       X X.1     Xbbbbbbbbbb     X.2 X.3
1         Date PX_LAST  NA         Date PX_LAST  NA
2   31/12/2002  38.855  NA   31/12/2002  19.547  NA
3   02/01/2003  38.664  NA   02/01/2003  19.547  NA
4   03/01/2003  40.386  NA   03/01/2003  19.547  NA
5   06/01/2003  40.386  NA   06/01/2003  19.609  NA
6   07/01/2003  40.195  NA   07/01/2003  19.609  NA

I want to skip this second row of the CSV file and just get

  X1.HK.Equity       X X.1 X2.HK.Equity     X.2 X.3
2   31/12/2002  38.855  NA   31/12/2002  19.547  NA
3   02/01/2003  38.664  NA   02/01/2003  19.547  NA
4   03/01/2003  40.386  NA   03/01/2003  19.547  NA
5   06/01/2003  40.386  NA   06/01/2003  19.609  NA
6   07/01/2003  40.195  NA   07/01/2003  19.609  NA

I tried data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1) but that returns:

        Date PX_LAST  X     Date.1 PX_LAST.1 X.1
1 31/12/2002  38.855 NA 31/12/2002    19.547  NA
2 02/01/2003  38.664 NA 02/01/2003    19.547  NA
3 03/01/2003  40.386 NA 03/01/2003    19.547  NA
4 06/01/2003  40.386 NA 06/01/2003    19.609  NA
5 07/01/2003  40.195 NA 07/01/2003    19.609  NA
6 08/01/2003  40.386 NA 08/01/2003    19.547  NA

The header row comes from the second line of my CSV file, not the first line.

Thank you.

mchangun about 11 years

Thanks for your reply. The last line dat = read.csv(skip_second, header = TRUE, stringsAsFactors = FALSE) gives me an error Error in file(file, "rt") : invalid 'description' argument. How can I get read.csv to accept a variable instead of a file path?
Paul Hiemstra about 11 years

Use textConnection in addition.
Nathaniel Payne almost 10 years

As a heads up Paul, this approach worked brilliantly with smaller files (less than 5MB), but had trouble with larger files. I asked a question on it and provided an answer after getting it working nicely on larger files here: stackoverflow.com/questions/24921387/…