R - Reading lines from a .txt-file after a specific line
Solution 1
1) read.pattern read.pattern
in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines
in this self contained example can be replaced with "myfile.txt"
, say, if the data is coming from a file. Modify the pattern to suit.
Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043"
library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")
giving:
> DF
V1 V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043
2) read twice Another possibility using only base R is simply to read it once to determine the value of skip=
and a second time to do the actual read using that value. To read from a file myfile.txt
replace text = Lines
and textConnection(Lines)
with "myfile.txt"
.
read.table(text = Lines,
skip = grep("##XYDATA=", readLines(textConnection(Lines))))
Added Some revisions and added second approach.
Solution 2
This looks like a job for data.table
's fread
library(data.table)
impcoord <- fread("file.txt",skip="coordinatesXY")
--edit--
That is why it is good to give a reproducible example. That error means your file is causing trouble.
The skip command matches the text you give it to the file to identify what line to start at, so you need to give it a unique string from the start of the line that you want it to start reading from. That function would work for something like this:
## some random text
## some more random text
## More random text
table_heading1, table_heading2, table_heading3 ...etc
value1, value2, value3 ... etc
etc
Just_The_Table <- fread("the_above_as_a_text_file.txt", skip="table_heading1", header=T)
Solution 3
A possible approach could be the following:
conn<-file("file.txt",open="rt")
x<-TRUE
while (x) {
x<-!grepl("coordinatesXY",readLines(conn,n=1))
}
ret<-read.table(conn,...) #insert additional parameters to read.table
close(conn)
You read one line at the time from the input file and stop when you find the indicator string. Then you read the file through read.table
. With this approach you don't store the entire file in memory, but just the piece you need.
Olli J
Updated on June 29, 2022Comments
-
Olli J almost 2 years
I have a bunch of output .txt-files that consists of a large parameter list and a X-Y-coordinate set. I need to extract these coordinates from all files so that only those lines are imported to a vector. This would work fine with
impcoord<-read.table("file.txt",skip= ,nrow= ,...)
but the files print the coordinate sets after different lengths of supporting parameters.
Luckily the coordinates always start after a line containing certain words.
Thus my question is, how do I start reading the .txt-file after these words? Let's say they are:
coordinatesXY
Thanks alot for your time and help!
-Olli
--Edit--
Sorry for the confusion.
The part of the file is as follows:
##XYDATA= (X++(Y..Y)) 131071 -2065 131070 -4137 131069 -6408 131068 -8043 ... ... ... ...
The first line being the one where
skip
should end and the following coordinates need to be imported to a vector. As you can see the X-coordinates start from 131071 and end to 0. -
Olli J over 9 yearsThank you for your answer! This approach, however, returns an error
ends field 1 on line 2 when detecting types: ##END=
regardless of what values or nominators I give inskip=
. -
JeremyS over 9 yearssounds file specific, see the edit in my answer for an example of how fread skip works.
-
89_Simple over 5 yearsThis solution is really great. Thanks
-
Pablo Herreros Cantis over 3 yearsNice solution! Can
fread
use itsskip
argument with a regular expression?