r - read.csv - skip rows with different number of columns

13,205

You could try:

read.csv(text=readLines('myfile.csv')[-(1:5)])

This will initially store each line in its own vector element, then drop the first five and treat the rest as a csv.

Share:
13,205
datavoredan
Author by

datavoredan

Economist by training, Data Scientist by trade. Working towards being a more complete data scientist through learning about the best tools available. Working to become proficient in: Object Oriented Programming Python (pandas etc + Django) R SQL

Updated on June 17, 2022

Comments

  • datavoredan
    datavoredan almost 2 years

    There are 5 rows at the top of my csv file which serve as information about the file, which I do not need.

    These information rows have only 2 columns, while the headers, and rows of data (from 6 on-wards) have 8. This appears to be the cause of the issue.

    I have tried using the skip function within read.csv to skip these lines, and the same with read.table

    df = read.csv("myfile.csv", skip=5)
    df = read.table("myfile.csv", skip=5)
    

    but this still gives me the same error message, which is:

    Error in read.table("myfile.csv",  :empty beginning of file
    

    In addition: Warning messages:

    1: In readLines(file, skip) : line 1 appears to contain an embedded nul
    2: In readLines(file, skip) : line 2 appears to contain an embedded nul
    ...
    5: In readLines(file, skip) : line 5 appears to contain an embedded nul
    

    How can I get this .csv to be read into r without the null values in the first 5 rows causing this issue?

  • Bono
    Bono about 9 years
    That only gets rid of the messages, but does not actually solves the problem.
  • Michal aka Miki
    Michal aka Miki about 7 years
    How can you skip columns if you have some of them causing problems? Here one example stackoverflow.com/q/5788117/54964