read.csv, header on first line, skip second line
Solution 1
This should do the trick:
all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)
The first step using readLines
reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index
. Finally, we feed this data to read.csv
to process it into a data.frame
.
Solution 2
You can strip the first line(s) after the header directly from the dataframe, to allow you to do this in one line:
df<-read.csv("test.txt",header=T)[-1,]
if my datafile "test.txt" is the following:
var1, var2
units1, units2
2.3,6.8
4.5,6.7
this gives me
> read.csv("test.txt",header=T)[-1,]
var1 var2
2 2.3 6.8
3 4.5 6.7
This answers your question exactly, but just to generalize the answer, you can also skip the Nth to the Mth lines in this way:
df<-read.csv("test.txt",header=T)[-N:-M,]
where N and M are integers of course.
Note: This method will convert all columns into factor.
str(read.csv("test.csv", header = TRUE)[-1,])
# 'data.frame': 2 obs. of 2 variables:
# $ var1: Factor w/ 3 levels "2.3","4.5","units1": 1 2
# $ var2: Factor w/ 3 levels " units2","6.7",..: 3 2
Solution 3
On Linux (or Mac) you can take advantage of being able to use linux commands in data.table::fread
, so
data.table::fread("sed -e '2d' myfile.txt", data.table = F)
will skip the second line.
mchangun
Updated on October 10, 2020Comments
-
mchangun over 3 years
I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:
data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE)
The first row becomes the header and the second row of the file becomes the first row of my data frame:
Xaaaaaaaaa X X.1 Xbbbbbbbbbb X.2 X.3 1 Date PX_LAST NA Date PX_LAST NA 2 31/12/2002 38.855 NA 31/12/2002 19.547 NA 3 02/01/2003 38.664 NA 02/01/2003 19.547 NA 4 03/01/2003 40.386 NA 03/01/2003 19.547 NA 5 06/01/2003 40.386 NA 06/01/2003 19.609 NA 6 07/01/2003 40.195 NA 07/01/2003 19.609 NA
I want to skip this second row of the CSV file and just get
X1.HK.Equity X X.1 X2.HK.Equity X.2 X.3 2 31/12/2002 38.855 NA 31/12/2002 19.547 NA 3 02/01/2003 38.664 NA 02/01/2003 19.547 NA 4 03/01/2003 40.386 NA 03/01/2003 19.547 NA 5 06/01/2003 40.386 NA 06/01/2003 19.609 NA 6 07/01/2003 40.195 NA 07/01/2003 19.609 NA
I tried
data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1)
but that returns:Date PX_LAST X Date.1 PX_LAST.1 X.1 1 31/12/2002 38.855 NA 31/12/2002 19.547 NA 2 02/01/2003 38.664 NA 02/01/2003 19.547 NA 3 03/01/2003 40.386 NA 03/01/2003 19.547 NA 4 06/01/2003 40.386 NA 06/01/2003 19.609 NA 5 07/01/2003 40.195 NA 07/01/2003 19.609 NA 6 08/01/2003 40.386 NA 08/01/2003 19.547 NA
The header row comes from the second line of my CSV file, not the first line.
Thank you.
-
mchangun about 11 yearsThanks for your reply. The last line
dat = read.csv(skip_second, header = TRUE, stringsAsFactors = FALSE)
gives me an errorError in file(file, "rt") : invalid 'description' argument
. How can I get read.csv to accept a variable instead of a file path? -
Paul Hiemstra about 11 yearsUse
textConnection
in addition. -
Nathaniel Payne almost 10 yearsAs a heads up Paul, this approach worked brilliantly with smaller files (less than 5MB), but had trouble with larger files. I asked a question on it and provided an answer after getting it working nicely on larger files here: stackoverflow.com/questions/24921387/…