Selecting and printing specific rows of text file

12,620

Solution 1

You can call join on the ranges.

lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

Solution 2

might i suggest not storing the entire file (since it is large) as per one of your links?

f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
    if i > 4 and i < 68:
        n.write(text)
    elif i > 88 and i < 112:
        n.write(text)
    else:
        pass

i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

Solution 3

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.

Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:

itertools.islice(iterable, start, stop[, step])

Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line)

Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:

from itertools import islice

# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file: 
    txt = '\n'.join(islice(data_file, 10, 20))

Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

Share:
12,620
Stedy
Author by

Stedy

Using R for evaluation for a variety of research interests, primarily in the field of public health. Profile picture is from the Olympic Mountains in WA state

Updated on June 15, 2022

Comments

  • Stedy
    Stedy almost 2 years

    I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.

    So far this is the only approach I have gotten to work:

    lines = readin.readlines()
    out1.write(str(lines[5:67]))
    out2.write(str(lines[89:111]))
    

    However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)